Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10 (4), e1003581
eCollection

Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time

Affiliations

Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time

David J McIver et al. PLoS Comput Biol.

Abstract

Circulating levels of both seasonal and pandemic influenza require constant surveillance to ensure the health and safety of the population. While up-to-date information is critical, traditional surveillance systems can have data availability lags of up to two weeks. We introduce a novel method of estimating, in near-real time, the level of influenza-like illness (ILI) in the United States (US) by monitoring the rate of particular Wikipedia article views on a daily basis. We calculated the number of times certain influenza- or health-related Wikipedia articles were accessed each day between December 2007 and August 2013 and compared these data to official ILI activity levels provided by the Centers for Disease Control and Prevention (CDC). We developed a Poisson model that accurately estimates the level of ILI activity in the American population, up to two weeks ahead of the CDC, with an absolute average difference between the two estimates of just 0.27% over 294 weeks of data. Wikipedia-derived ILI models performed well through both abnormally high media coverage events (such as during the 2009 H1N1 pandemic) as well as unusually severe influenza seasons (such as the 2012-2013 influenza season). Wikipedia usage accurately estimated the week of peak ILI activity 17% more often than Google Flu Trends data and was often more accurate in its measure of ILI intensity. With further study, this method could potentially be implemented for continuous monitoring of ILI activity in the US and to provide support for traditional influenza surveillance tools.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Time series plot of CDC ILI data versus estimated ILI data.
(A) Wikipedia Full Model (Mf) accurately estimated 3 out of 6 ILI activity peaks and had a mean absolute difference of 0.27% compared to CDC ILI data. (B) Wikipedia Lasso Model (Ml) accurately estimated 2 out of 6 ILI activity peaks and had a mean absolute difference of 0.29% compared to CDC ILI data,. (C) Google Flue Trends (GFT) model accurately estimated 2 of 6 ILI activity peaks and had a mean absolute difference of 0.42% compared to CDC ILI data.

Similar articles

See all similar articles

Cited by 56 PubMed Central articles

See all "Cited by" articles

References

    1. WHO | Influenza (Seasonal) (2009). WHO. Available: http://www.who.int/mediacentre/factsheets/fs211/en/. Accessed 18 November 2013.
    1. CDC - Key Facts about Influenza (Flu) & Flu Vaccine | Seasonal Influenza (Flu) (2013). Available: http://www.cdc.gov/flu/keyfacts.htm.Accessed 18 November 2013.
    1. Eysenbach G (2009) Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet. J Med Internet Res 11: e11. - PMC - PubMed
    1. Chunara R, Andrews JR, Brownstein JS (2012) Social and News Media Enable Estimation of Epidemiological Patterns Early in the 2010 Haitian Cholera Outbreak. Am J Trop Med Hyg 86: 39–45. - PMC - PubMed
    1. Chunara R, Bouton L, Ayers JW, Brownstein JS (2013) Assessing the Online Social Environment for Surveillance of Obesity Prevalence. PLoS ONE 8: e61373. - PMC - PubMed

Publication types

MeSH terms

Feedback