Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach

PLoS One. 2021 May 19;16(5):e0250890. doi: 10.1371/journal.pone.0250890. eCollection 2021.


Effective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Systems / statistics & numerical data
  • Disease Outbreaks / statistics & numerical data
  • Electronic Health Records
  • Epidemiological Monitoring
  • Forecasting / methods*
  • France / epidemiology
  • Humans
  • Influenza, Human / epidemiology*
  • Information Storage and Retrieval
  • Internet
  • Machine Learning*
  • Models, Statistical
  • Public Health Surveillance / methods
  • Social Media / statistics & numerical data
  • Weather