Influenza-like illness surveillance on Twitter through automated learning of naïve language

PLoS One. 2013 Dec 4;8(12):e82489. doi: 10.1371/journal.pone.0082489. eCollection 2013.

Abstract

Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Humans
  • Influenza, Human / epidemiology*
  • Internet*
  • Population Surveillance / methods*
  • Terminology as Topic*

Grants and funding

The study has been funded by the Bambino Gesù Children's Hospital. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.