A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews

J Am Med Inform Assoc. Nov-Dec 2014;21(6):1098-103. doi: 10.1136/amiajnl-2014-002711. Epub 2014 Jun 10.


Online physician reviews are a massive and potentially rich source of information capturing patient sentiment regarding healthcare. We analyze a corpus comprising nearly 60,000 such reviews with a state-of-the-art probabilistic model of text. We describe a probabilistic generative model that captures latent sentiment across aspects of care (eg, interpersonal manner). We target specific aspects by leveraging a small set of manually annotated reviews. We perform regression analysis to assess whether model output improves correlation with state-level measures of healthcare. We report both qualitative and quantitative results. Model output correlates with state-level measures of quality healthcare, including patient likelihood of visiting their primary care physician within 14 days of discharge (p=0.03), and using the proposed model better predicts this outcome (p=0.10). We find similar results for healthcare expenditure. Generative models of text can recover important information from online physician reviews, facilitating large-scale analyses of such reviews.

Keywords: natural language processing; physician reviews; social media; topic modeling.

Publication types

  • Evaluation Study

MeSH terms

  • Internet*
  • Models, Statistical
  • Patient Satisfaction*
  • Physician-Patient Relations
  • Physicians / standards*
  • Quality of Health Care*
  • Regression Analysis
  • United States