Rich Text Formatted EHR Narratives: A Hidden and Ignored Trove

Stud Health Technol Inform. 2019 Aug 21:264:472-476. doi: 10.3233/SHTI190266.

Abstract

This study presents an approach for mining structured information from clinical narratives in Electronic Health Records (EHRs) by using Rich Text Formatted (RTF) records. RTF is adopted by many medical information management systems. There is rich structural information in these files which can be extracted and interpreted, yet such information is largely ignored. We investigate multiple types of EHR narratives in the Enterprise Data Warehouse from a multisite large healthcare chain consisting of both, an academic medical center and community hospitals. We focus on the RTF constructs related to tables and sections that are not available in plain text EHR narratives. We show how to parse these RTF constructs, analyze their prevalence and characteristics in the context of multiple types of EHR narratives. Our case study demonstrates the additional utility of the features derived from RTF constructs over plain text oriented NLP.

Keywords: Electronic Health Records; Information Management; Natural Language Processing.

MeSH terms

  • Academic Medical Centers
  • Data Warehousing
  • Electronic Health Records*
  • Histological Techniques
  • Narration*