Advertisement

Why Google Flu Trends Missed the Mark So Badly

By on

fluby Angela Guess

Martin Willcox of Teradata recently wrote in Forbes, “Unless you have just returned to Earth after a short break on Mars, you will have noted that some of the shine has come off the big data bandwagon lately. Two academic papers that may have escaped your attention can help us to understand why – but also demonstrate that the naysayers are as misguided in their cynicism as are the zealots are in their naïvety. Google Flu Trends (GFT) was once held-up as the prototypical example of the power of big data.  By leveraging search term data – apparently worthless “data exhaust” – a group of Data Scientists with little relevant expertise were able to predict the spread of flu across the continental United States… Except that they weren’t.”

Willcox goes on, “We now know that GFT systematically over-estimated cases – and was likely predicting winter, not flu.  The first paper attempts to be even-handed and magnanimous in its analysis of what went wrong – and even succeeds, for the most part – but the label that the authors give to one of the mistakes made by the Google team (‘Big Data Hubris’) rather gives the game away.”

He continues, “Traditional approaches to analytics – what you might call the ‘correlation is not causality’ school – have emphasised the importance of rigorous statistical method and understanding of the problem space. By contrast, some of what we might characterise as the ‘unreasonable effectiveness of data’ crowd have gone so far as to claim that understanding is over-rated – and that with a big enough bucket of data, there is no question that they can’t answer, even if it is only ‘what’ that is known, not ‘why’.”

Read more here.

photo credit: Flickr/ JeepersMedia

Leave a Reply