Combining structured and unstructured data to identify a cohort of ICU patients who received dialysis

J Am Med Inform Assoc. Sep-Oct 2014;21(5):801-7. doi: 10.1136/amiajnl-2013-001915. Epub 2014 Jan 2.


Objective: To develop a generalizable method for identifying patient cohorts from electronic health record (EHR) data-in this case, patients having dialysis-that uses simple information retrieval (IR) tools.

Methods: We used the coded data and clinical notes from the 24,506 adult patients in the Multiparameter Intelligent Monitoring in Intensive Care database to identify patients who had dialysis. We used SQL queries to search the procedure, diagnosis, and coded nursing observations tables based on ICD-9 and local codes. We used a domain-specific search engine to find clinical notes containing terms related to dialysis. We manually validated the available records for a 10% random sample of patients who potentially had dialysis and a random sample of 200 patients who were not identified as having dialysis based on any of the sources.

Results: We identified 1844 patients that potentially had dialysis: 1481 from the three coded sources and 1624 from the clinical notes. Precision for identifying dialysis patients based on available data was estimated to be 78.4% (95% CI 71.9% to 84.2%) and recall was 100% (95% CI 86% to 100%).

Conclusions: Combining structured EHR data with information from clinical notes using simple queries increases the utility of both types of data for cohort identification. Patients identified by more than one source are more likely to meet the inclusion criteria; however, including patients found in any of the sources increases recall. This method is attractive because it is available to researchers with access to EHR data and off-the-shelf IR tools.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Intramural

MeSH terms

  • Adult
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • International Classification of Diseases
  • Kidney Failure, Chronic / therapy
  • Programming Languages
  • Renal Dialysis / statistics & numerical data*