Robust two-stage influenza prediction model considering regular and irregular trends

PLoS One. 2020 May 21;15(5):e0233126. doi: 10.1371/journal.pone.0233126. eCollection 2020.

Abstract

Influenza causes numerous deaths worldwide every year. Predicting the number of influenza patients is an important task for medical institutions. Two types of data regarding influenza-like illnesses (ILIs) are often used for flu prediction: (1) historical data and (2) user generated content (UGC) data on the web such as search queries and tweets. Historical data have an advantage against the normal state but show disadvantages against irregular phenomena. In contrast, UGC data are advantageous for irregular phenomena. So far, no effective model providing the benefits of both types of data has been devised. This study proposes a novel model, designated the two-stage model, which combines both historical and UGC data. The basic idea is, first, basic regular trends are estimated using the historical data-based model, and then, irregular trends are predicted by the UGC data-based model. Our approach is practically useful because we can train models separately. Thus, if a UGC provider changes the service, our model could produce better performance because the first part of the model is still stable. Experiments on the US and Japan datasets demonstrated the basic feasibility of the proposed approach. In the dropout (pseudo-noise) test that assumes a UGC service would change, the proposed method also showed robustness against outliers. The proposed model is suitable for prediction of seasonal flu.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Disease Outbreaks*
  • Humans
  • Influenza, Human / epidemiology*
  • Influenza, Human / transmission*
  • Models, Biological*
  • Predictive Value of Tests

Grants and funding

This study was supported in part by JSPS KAKENHI Grant Number JP19K20279, Health and Labor Sciences Research Grant Number H30-shinkougyousei-shitei-004, and Yahoo! Japan. The funder provided support in the form of salaries for authors (Dr Nobuyuki Shimizu and Mr Sumio Fujita), but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.