Influenza forecasting with Google Flu Trends

PLoS One. 2013;8(2):e56176. doi: 10.1371/journal.pone.0056176. Epub 2013 Feb 14.

Abstract

Background: We developed a practical influenza forecast model based on real-time, geographically focused, and easy to access data, designed to provide individual medical centers with advanced warning of the expected number of influenza cases, thus allowing for sufficient time to implement interventions. Secondly, we evaluated the effects of incorporating a real-time influenza surveillance system, Google Flu Trends, and meteorological and temporal information on forecast accuracy.

Methods: Forecast models designed to predict one week in advance were developed from weekly counts of confirmed influenza cases over seven seasons (2004-2011) divided into seven training and out-of-sample verification sets. Forecasting procedures using classical Box-Jenkins, generalized linear models (GLM), and generalized linear autoregressive moving average (GARMA) methods were employed to develop the final model and assess the relative contribution of external variables such as, Google Flu Trends, meteorological data, and temporal information.

Results: A GARMA(3,0) forecast model with Negative Binomial distribution integrating Google Flu Trends information provided the most accurate influenza case predictions. The model, on the average, predicts weekly influenza cases during 7 out-of-sample outbreaks within 7 cases for 83% of estimates. Google Flu Trend data was the only source of external information to provide statistically significant forecast improvements over the base model in four of the seven out-of-sample verification sets. Overall, the p-value of adding this external information to the model is 0.0005. The other exogenous variables did not yield a statistically significant improvement in any of the verification sets.

Conclusions: Integer-valued autoregression of influenza cases provides a strong base forecast model, which is enhanced by the addition of Google Flu Trends confirming the predictive capabilities of search query based syndromic surveillance. This accessible and flexible forecast model can be used by individual medical centers to provide advanced warning of future influenza cases.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Child
  • Disease Outbreaks
  • Humans
  • Influenza, Human / epidemiology*
  • Linear Models
  • Models, Biological
  • Population Surveillance
  • Search Engine / methods*
  • Seasons
  • Weather

Grant support

This work was supported by the Department of Homeland Security (PACER: National Center for Study of Preparedness and Response [grant number: 2010-ST-061-PA0001]); the National Science Foundation Systems Engineering and Design Program [grant number: NSF CMMI 0927207]; and the Natural Sciences and Engineering Research Council of Canada. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.