A practical approach for content mining of Tweets

Am J Prev Med. 2013 Jul;45(1):122-129. doi: 10.1016/j.amepre.2013.02.025.


Use of data generated through social media for health studies is gradually increasing. Twitter is a short-text message system developed 6 years ago, now with more than 100 million users generating over 300 million Tweets every day. Twitter may be used to gain real-world insights to promote healthy behaviors. The purposes of this paper are to describe a practical approach to analyzing Tweet contents and to illustrate an application of the approach to the topic of physical activity. The approach includes five steps: (1) selecting keywords to gather an initial set of Tweets to analyze; (2) importing data; (3) preparing data; (4) analyzing data (topic, sentiment, and ecologic context); and (5) interpreting data. The steps are implemented using tools that are publically available and free of charge and designed for use by researchers with limited programming skills. Content mining of Tweets can contribute to addressing challenges in health behavior research.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Data Collection / methods
  • Data Mining / methods*
  • Health Behavior*
  • Humans
  • Internet / statistics & numerical data*
  • Research / organization & administration
  • Research Design
  • Social Media / statistics & numerical data*