NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records

Int J Med Inform. 2015 Dec;84(12):1039-47. doi: 10.1016/j.ijmedinf.2015.06.007. Epub 2015 Jul 2.


Background: In order to proactively manage congestive heart failure (CHF) patients, an effective CHF case finding algorithm is required to process both structured and unstructured electronic medical records (EMR) to allow complementary and cost-efficient identification of CHF patients.

Methods and results: We set to identify CHF cases from both EMR codified and natural language processing (NLP) found cases. Using narrative clinical notes from all Maine Health Information Exchange (HIE) patients, the NLP case finding algorithm was retrospectively (July 1, 2012-June 30, 2013) developed with a random subset of HIE associated facilities, and blind-tested with the remaining facilities. The NLP based method was integrated into a live HIE population exploration system and validated prospectively (July 1, 2013-June 30, 2014). Total of 18,295 codified CHF patients were included in Maine HIE. Among the 253,803 subjects without CHF codings, our case finding algorithm prospectively identified 2411 uncodified CHF cases. The positive predictive value (PPV) is 0.914, and 70.1% of these 2411 cases were found to be with CHF histories in the clinical notes.

Conclusions: A CHF case finding algorithm was developed, tested and prospectively validated. The successful integration of the CHF case findings algorithm into the Maine HIE live system is expected to improve the Maine CHF care.

Keywords: Congestive heart failure; Electronic Medical record; Natural language processing; Prospective validation; Random forests.

Publication types

  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Data Mining / methods*
  • Decision Support Systems, Clinical / organization & administration
  • Electronic Health Records / statistics & numerical data*
  • Heart Failure / epidemiology*
  • Humans
  • Maine / epidemiology
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods*
  • Prevalence
  • Prospective Studies
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Vocabulary, Controlled