Identifying populations of heart failure (HF) patients is paramount to research efforts aimed at developing strategies to effectively reduce the burden of this disease. The use of electronic medical record (EMR) data for this purpose is challenging given the syndromic nature of HF and the need to distinguish HF with preserved or reduced ejection fraction. Using a gold standard cohort of manually abstracted cases, an EMR-driven phenotype algorithm based on structured and unstructured data was developed to identify all the cases. The resulting algorithm was executed in two cohorts from the Electronic Medical Records and Genomics (eMERGE) Network with a positive predictive value of >95 %. The algorithm was expanded to include three hierarchical definitions of HF (i.e., definite, probable, possible) based on the degree of confidence of the classification to capture HF cases in a whole population whereby increasing the algorithm utility for use in e-Epidemiologic research.
Keywords: Electronic medical records; Heart failure; Natural language processing; Phenotyping; Ventricular ejection fraction.