On Wednesday, November 2nd, the Chicago Cubs proved that anyone can come back and win on the world’s biggest stage. That following Tuesday, November 8th, Donald Trump did what was also thought to be impossible, by becoming the 45th president of the United States.
As election day got closer and closer, the media, and therefore the public, depended more and more on polling predictions and data visualization, often using digital savvy maps and charts to illustrate the many winning paths Hillary Clinton had, and the many obstacles Trump would have to face. It was safe to say that there was an overwhelming consensus amongst pollsters and data journalists that Hillary Clinton would emerge victorious Tuesday night.However, what was supposed to be a relatively early election night, ended up being a late night apocalypse, leaving people across the country wondering, how the hell did this happen?
Now that the initial shock has subsided, reputable polls and aggregators are bearing the brunt of criticism for getting it so wrong. While pollsters will spend many months trying to figure out how exactly their results were off, is it fair to say that Donald Trump’s surprise victory was a failure of big data? Not really. The possible answer is far more complicated than that.
First off, the national polls weren’t that off. Yes, Donald Trump did win the electoral college, securing his presidency. But according to ABC News, Hillary Clinton’s popular vote lead now exceeds 1.5 million votes, and as more ballots come in, that number will continue to grow. That’s around 2.7 points off of Real Clear Politics‘ final polling average, estimating Clinton’s lead over Trump. So the polls and data journalists were right in predicting that more people would vote for Clinton. But the problems was that the media depended too much on national polling averages, often using these polls to write Trump off as having a one in three chance of winning. It was the state polls, particularly in the rural areas of the Rust Belt where Trump thrived, that should have raised some skepticism. Just a few days before the election, Fivethirtyeight’s Nate Silver picked up on how the state polls illustrated a far more competitive race, not an electoral map neatly aligned in Clinton’s favor. According to Silver, Clinton was down by about 3 percentage points in battleground states like Pennsylvania. Polls were also tightening with an more undecided voters in Michigan, another key state in the electoral college. Trump’s strength in rural Pennsylvania ultimately helped him carry the state. Clinton won the county of Flint, Michigan, but severely underperformed in other areas, handing the state to Trump.
This leads me to the current debate about voters who said they were unsure of who they were voting for, even just a few days before the election. Pollsters and political analysts have a theory that the so called group of ‘undecided voters’, were a large portion of the ‘hidden’ Trump vote.
A study done in a public opinion course at Cornell University tested the hidden Trump vote theory. Like many polling surveys, including our own Marist Poll’s, their survey asked, “If the presidential election were being held today, whom would you support: Hillary Clinton, the Democratic candidate; Donald Trump, the Republican candidate; other; or are you unsure or don’t intend to vote?” They found that 20 percent of respondents indicated that they did not intend to vote, supported another candidate or did not want to answer the question. Those respondents that remained uncommitted to either major candidate (or to voting) were more likely to view Trump as more truthful, offering further evidence of this secret Trump vote. However, pollsters are finding that there is not a strong marginal difference between how respondents answered phone polls versus online polls, assuming that more people would be open to answering honestly via internet surveys. Despite the narrow statistical evidence, psychologically it makes sense that Trump supporters might’ve felt uncomfortable or even embarrassed to respond honestly, given the many controversial, offensive things Trump said on the campaign trail.
One of the problems with this election cycle’s forecasts was the increased pressure of the polls to rapidly produce large quantities of data, resulting in the declining quality of some of their predictions. Director of the Marist Poll Lee Miringoff expressed his frustrations with the polling and data journalists, “We saw in this election an avalanche of new polling methodologies with unproven track records . . . online polls, panel polls, internet polls. They can be quicker and cost less, but they’re unproven and not particularly successful.” Marist Poll’s November national results were among the closest, with Trump at 43 percent and Clinton at 44 percent, as opposed to Monmouth and the AP GfK Polls, who had Clinton leading by more than 5 percentage points. Although he sees the incredible value of data journalism, Miringoff said aggregating polls from here and there and developing a prediction could be filled with errors, “There’s a problem in the aggregators who just pile all the polls into these visualizations, where they are all treated equally. This can be very misleading for people who want to follow elections closely.” Still, going forward, Miringnoff said the Marist Poll, along with the entire polling world, have a lot of work to do.
Political pundits, pollsters and journalists could spend years picking apart the various things that might’ve lead to the enormous number of faulty predictions in this election. But the bottom line is that demographics and numerical data are not destiny. The American media and its viewers were wrongfully treating polling forecasts as infallible. This was an unprecedented election, with two unprecedented candidates. And no one could predict the unpredictable.