by Angela Guess
Kathleen Hickey recently wrote in GCN, “While big data analytics have been touted for their ability to find signals in a sea of noise, they cannot tell what those signals mean. Without a solid grasp of what data is being mined, knowledge of its accuracy and why and how it is being mined, big data can end up causing more problems than it solves. This problem can be most acutely seen in the public health arena, where the amount of data is increasing exponentially. ‘Paradoxically, the proportion of false alarms among all proposed ‘findings’ may increase when one can measure more things,’ Muin Khoury and John Ioannidis wrote in a recent report, ‘Big Data Meets Public Health.’ That’s what happened when Google dramatically overestimated peak flu levels, basing the analysis on flu-related Internet searches.”
Hickey goes on, “Analytics, in other words, is only as good as its data foundation — which, in some cases, is shaky. ‘Research accuracy is dictated by the weakest link,’ the authors said, with current analytics often based on ‘convenient samples of people or information available on the Internet.’ Information gleaned from the Internet needs to be integrated with other data and interpreted with ‘knowledge management, knowledge synthesis and knowledge translation,’ Khoury and Ioannidis stated. Machine learning algorithms can help — although, again, as Microsoft learned when its Twitter bot Tay went off the rails, parameters must be set to avoid havoc when data is collected.”
Photo credit: Flickr