Text and structural data mining of influenza mentions in Web and social media

Int J Environ Res Public Health. 2010 Feb;7(2):596-615. doi: 10.3390/ijerph7020596. Epub 2010 Feb 22.


Text and structural data mining of web and social media (WSM) provides a novel disease surveillance resource and can identify online communities for targeted public health communications (PHC) to assure wide dissemination of pertinent information. WSM that mention influenza are harvested over a 24-week period, 5 October 2008 to 21 March 2009. Link analysis reveals communities for targeted PHC. Text mining is shown to identify trends in flu posts that correlate to real-world influenza-like illness patient report data. We also bring to bear a graph-based data mining technique to detect anomalies among flu blogs connected by publisher type, links, and user-tags.

Keywords: disease surveillance; graph-based data mining; health informatics; public health epidemiology; social network analysis; web and social media.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Humans
  • Influenza, Human*
  • Information Storage and Retrieval*
  • Internet*
  • Population Surveillance
  • Social Support*