What can digital disease detection learn from (an external revision to) Google Flu Trends?

Am J Prev Med. 2014 Sep;47(3):341-7. doi: 10.1016/j.amepre.2014.05.020. Epub 2014 Jul 2.

Abstract

Background: Google Flu Trends (GFT) claimed to generate real-time, valid predictions of population influenza-like illness (ILI) using search queries, heralding acclaim and replication across public health. However, recent studies have questioned the validity of GFT.

Purpose: To propose an alternative methodology that better realizes the potential of GFT, with collateral value for digital disease detection broadly.

Methods: Our alternative method automatically selects specific queries to monitor and autonomously updates the model each week as new information about CDC-reported ILI becomes available, as developed in 2013. Root mean squared errors (RMSEs) and Pearson correlations comparing predicted ILI (proportion of patient visits indicative of ILI) with subsequently observed ILI were used to judge model performance.

Results: During the height of the H1N1 pandemic (August 2 to December 22, 2009) and the 2012-2013 season (September 30, 2012, to April 12, 2013), GFT's predictions had RMSEs of 0.023 and 0.022 (i.e., hypothetically, if GFT predicted 0.061 ILI one week, it is expected to err by 0.023) and correlations of r=0.916 and 0.927. Our alternative method had RMSEs of 0.006 and 0.009, and correlations of r=0.961 and 0.919 for the same periods. Critically, during these important periods, the alternative method yielded more accurate ILI predictions every week, and was typically more accurate during other influenza seasons.

Conclusions: GFT may be inaccurate, but improved methodologic underpinnings can yield accurate predictions. Applying similar methods elsewhere can improve digital disease detection, with broader transparency, improved accuracy, and real-world public health impacts.

MeSH terms

  • Disease Outbreaks
  • Humans
  • Influenza A Virus, H1N1 Subtype / isolation & purification*
  • Influenza, Human / epidemiology*
  • Internet
  • Population Surveillance / methods
  • Search Engine*
  • Seasons