Trump won. Polls lost. So what have we learned for our own data collection processes?

Trump won. Polls lost. So what have we learned for our own data collection processes? by Bryan Melmed

Bryan Melmed unpacks how every poll conducted around the presidential election failed to come to the correct conclusion due to the results being systematically biased in Clinton’s favour.

Just days after the shock result of a Trump presidency, a few pundits had already completed their analysis of how it happened. Wouldn’t you know, these conclusions validated everything they were saying all along, and the companies they represent were obvious solutions to whatever problems were identified.

That isn’t analysis, it is rationalisation. Fellow marketers, let’s leave that sort of thinking to politicians.

To actually understand what happened, much less how it should change the way things work, will take some time. And actual thinking. As Patrick Murray, director of the Monmouth University Polling Institute, said to Nate Silver — if anyone thinks they have the answer right now, they’re just guessing.

For those of us working in the data sciences, there’s a strange sense of shared responsibility for the polls that lead so many people to believe that Hillary would be the president-elect. In the days after the election, I fielded more questions about surveys and modelling than I encountered during the entire campaign.

As Obama might say, this is a teachable moment. Let’s unpack what we know so far.

The polls were wrong

There is no avoiding the fact that almost every poll conducted during this election came to the wrong conclusion. Even the Trump campaign, until a week before the election, thought they had a one-in-five chance of winning.

Then again, we should expect that any survey with a low response rate is probably wrong. People more likely to respond are also more likely to answer one way or another.

Pollsters try to adjust for this by weighing responses differently. One extreme example comes from the USC poll, where the preference of one young African American was considered 30 times more important than the average panellist.

Weighting can work if demographics or other observed characteristics are highly correlated with preferences. But as marketers know all too well, demographic information is less useful every year. And if you can’t compare your assumptions to what actually happens – well, you’re just making assumptions.

Pollsters should have realised this sooner. Trump’s data team did, with only a few days to spare. They considered an alternate scenario where it was rural voters that flooded the polls. Even then the model only gave them a 30% chance of winning, but it pointed them in the right direction.

After the election, we learned the right approach was to overweight whites without a college degree, who were both less likely to answer a survey and more likely to show up at a voting booth.

Still, let’s not exaggerate the problem. This was a very close election. Trump won the delegate count by as few as 107,000 votes and lost the popular vote. Polls this year were about as accurate as they were in 2012, when they underestimated Obama’s appeal.

The failure of polling was not in the margin of error, but that results were systematically biased in Clinton’s favour. It wasn’t a statistical problem that researchers could easily control for, or even recognise. This was data in a bubble, without regard to other information that might have alerted researchers that something was off – the enormous crowds at Donald Trump rallies, for example.

Do marketers make the same mistakes with data collection? All the time

FiveThirtyEight 2016 Election Forecast put Clinton clearly in the lead.

Not only do we assume our data is unbiased – it’s shocking how little testing is done – but we also rely on shortcuts to gauge the impact of our efforts. If the truth is more expensive or even just more difficult to understand, truth never wins. Truth never even sees the light of day. The inevitable result is millions of dollars wasted on bad assumptions and empty promises.

Here’s one example. We know that video advertising targeting a highly qualified audience costs more, has fewer engagements and registers less time spent. Most agency balance sheets would consider that a loss. At Exponential, we’ve recently completed research that shows this careful targeting is still far more effective than finding eager viewers who have no likely path to purchase.

It’s hard to understand probability

Most people weren’t following the polls anyway, but the predictions based on polls. There were many of these, the most popular being the New York Times Upshot and Nate Silver’s FiveThirtyEight. They all favoured Hillary Clinton, giving her a chance of winning somewhere between 65 and 98 per cent. The betting markets had settled around 80 per cent.

Even a model with a 98 per cent chance of Clinton winning, courtesy of the Huffington Post, is just a prediction. The same model finds a two per cent probability that Clinton loses. That’s low, but not so low that it will never happen. If there was a two per cent chance of an earthquake today, most of us would be cowering in a corner somewhere. Of course this Huffington Post model was wildly optimistic, not considering that the polls might be systematically biased, as described above.

Another aspect of probability is that some models were bound to predict the election correctly, but only by chance. In other words, some predictions were wrong, but in the right way. That USC poll is one example. It is getting attention because it predicted Trump winning – but it showed Trump three points ahead in the popular vote and the methodology behind the poll is even more questionable.

Marketers need to brush up on these lessons as well.

For example, we’re often given case studies where the vaunted outcome is simply a coincidence. Look closely and there aren’t enough people considered to arrive at a statistically significant result. If you’re wondering how this is sustainable, consider that in our industry an effort that arrives at a different answer is easily swept under the rug.

Few people were thinking this through

From the moment Trump slowly descended on an escalator to announce his candidacy, we were continually amazed by this unpredictable election. The only predictable thing was how Trump would prove everyone wrong. And still, we put our faith in the old truisms on how this would all play out. Practically the entire Clinton campaign assumed that a ‘ground game’ and a data-driven strategy would again be the deciding factor. It wasn’t.

Marketers rarely make this mistake – and if they do, they don’t last very long.

If anything, we’re too focused on the next big thing. I’m old enough to remember being asked what our strategy was for Second Life. Now I see publishers rushing to create video content, collectively racing towards a low quality, oversaturated market. Even Verizon’s go90 couldn’t force this strategy. Internet video is going to be the Atari of our day.

The sad thing is that we are missing out on better ways to monetise page-based content, specifically with opt-in video advertising. Why not meet consumers where they prefer to be?

Marketers are in a better place

Pollsters still rely on snapshots of attitudinal behaviour. These aren’t always predictive and age quickly. In contrast, today’s marketer can observe real-time consumer behaviour. And, because the business cycle moves a lot faster than every four years, we have ample opportunity to test our data and improve the models we work with.

Our biggest risk is complacency.

When asked what lessons the election held for marketers, Dr Joseph Plummber from Columbia Business School replied: “Don’t get too in love with data . . . and use your own sense of what makes sense.”

The Clinton campaign had too much faith in their data. Strategic decisions relied on a secret algorithm that the campaign planned to unveil after their victory. ‘Ada’ ran 400,000 simulations a day to identify which battleground states were likely to tip the race.

In an article from data scientist Cathy O’Neil that wasn’t focused on the election, she points out the inevitable result. “Even when they are doing their very best, data scientists can end up with an algorithm that’s got a questionable definition of success and is trained by data that has cooked-in biases… The problem is the blind faith; people are turning too much power over to the algorithm.”

Clinton didn’t make a single visit to the ‘blue wall’ state of Wisconsin. It was “nothing short of malpractice” said Democratic pollster Paul Maslin.

Are you making the same mistake with your campaigns?