Extracting Healthcare Quality Information from Unstructured Data

AMIA Annu Symp Proc. 2018 Apr 16:2017:1243-1252. eCollection 2017.


Healthcare quality research is a fundamental task that involves assessing treatment patterns and measuring the associated patient outcomes to identify potential areas for improving healthcare. While both qualitative and quantitative approaches are used, a major obstacle for the quantitative approach is that many useful healthcare quality indicators are buried within provider narrative notes, requiring expensive and laborious manual chart review to identify and measure them. Information extraction is a key Natural Language Processing (NLP) task for discovering and mining critical knowledge buried in unstructured clinical data. Nevertheless, widespread adoption of NLP has yet to materialize; the technical skills required for the development or use of such software present a major barrier for medical researchers wishing to employ these methods. In this paper we introduce Canary, a free and open source solution designed for users without NLP and technical expertise and apply it to four tasks, aiming to measure the frequency of: (1) insulin decline; (2) statin medication decline; (3) adverse reactions to statins; and (3) bariatric surgery counselling. Our results demonstrate that this approach facilitates mining of unstructured data with high accuracy, enabling the extraction of actionable healthcare quality insights from free-text data sources.

MeSH terms

  • Bariatric Surgery
  • Data Mining / methods*
  • Humans
  • Hydroxymethylglutaryl-CoA Reductase Inhibitors / adverse effects
  • Hydroxymethylglutaryl-CoA Reductase Inhibitors / therapeutic use
  • Hypoglycemic Agents / therapeutic use
  • Insulin / therapeutic use
  • Natural Language Processing*
  • Quality of Health Care*
  • Research Personnel
  • Software
  • Treatment Refusal


  • Hydroxymethylglutaryl-CoA Reductase Inhibitors
  • Hypoglycemic Agents
  • Insulin