Influenza-like illness surveillance on Twitter through automated learning of naïve language

Francesco Gesualdo; Giovanni Stilo; Eleonora Agricola; Michaela V Gonfiantini; Elisabetta Pandolfi; Paola Velardi; Alberto E Tozzi

doi:10.1371/journal.pone.0082489

Influenza-like illness surveillance on Twitter through automated learning of naïve language

PLoS One. 2013 Dec 4;8(12):e82489. doi: 10.1371/journal.pone.0082489. eCollection 2013.

Authors

Francesco Gesualdo¹, Giovanni Stilo, Eleonora Agricola, Michaela V Gonfiantini, Elisabetta Pandolfi, Paola Velardi, Alberto E Tozzi

Affiliation

¹ Multifactorial Diseases and Complex Phenotypes Research Area, Bambino Gesù Children's Hospital IRCCS, Rome, Italy.

Abstract

Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Computer Simulation
Humans
Influenza, Human / epidemiology*
Internet*
Population Surveillance / methods*
Terminology as Topic*

Grants and funding

The study has been funded by the Bambino Gesù Children's Hospital. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.