Health-related hot topic detection in online communities using text clustering

PLoS One. 2013;8(2):e56221. doi: 10.1371/journal.pone.0056221. Epub 2013 Feb 15.

Abstract

Recently, health-related social media services, especially online health communities, have rapidly emerged. Patients with various health conditions participate in online health communities to share their experiences and exchange healthcare knowledge. Exploring hot topics in online health communities helps us better understand patients' needs and interest in health-related knowledge. However, the statistical topic analysis employed in previous studies is becoming impractical for processing the rapidly increasing amount of online data. Automatic topic detection based on document clustering is an alternative approach for extracting health-related hot topics in online communities. In addition to the keyword-based features used in traditional text clustering, we integrate medical domain-specific features to represent the messages posted in online health communities. Three disease discussion boards, including boards devoted to lung cancer, breast cancer and diabetes, from an online health community are used to test the effectiveness of topic detection. Experiment results demonstrate that health-related hot topics primarily include symptoms, examinations, drugs, procedures and complications. Further analysis reveals that there also exist some significant differences among the hot topics discussed on different types of disease discussion boards.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Communication*
  • Humans
  • Internet* / statistics & numerical data
  • Self-Help Groups* / statistics & numerical data
  • Social Support*

Grants and funding

This work was supported by National Natural Science Foundation of China (NSFC) grants 71171131. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.