Online physician reviews are a massive and potentially rich source of information capturing patient sentiment regarding healthcare. We analyze a corpus comprising nearly 60,000 such reviews with a state-of-the-art probabilistic model of text. We describe a probabilistic generative model that captures latent sentiment across aspects of care (eg, interpersonal manner). We target specific aspects by leveraging a small set of manually annotated reviews. We perform regression analysis to assess whether model output improves correlation with state-level measures of healthcare. We report both qualitative and quantitative results. Model output correlates with state-level measures of quality healthcare, including patient likelihood of visiting their primary care physician within 14 days of discharge (p=0.03), and using the proposed model better predicts this outcome (p=0.10). We find similar results for healthcare expenditure. Generative models of text can recover important information from online physician reviews, facilitating large-scale analyses of such reviews.
Keywords: natural language processing; physician reviews; social media; topic modeling.
Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.