Automating and improving cardiovascular disease prediction using Machine learning and EMR data features from a regional healthcare system

Int J Med Inform. 2022 Jul:163:104786. doi: 10.1016/j.ijmedinf.2022.104786. Epub 2022 Apr 29.

Abstract

Background: The ACC/AHA Pooled Cohort Equations (PCE) Risk Calculator is widely used in the US for primary prevention of atherosclerotic cardiovascular disease (ASCVD), but may under- or over-estimate risk in some populations. We therefore designed an automated, population-specific ASCVD risk calculator using machine-learning (ML) methods and electronic medical record (EMR) data, and compared its predictive power with that of the PCE calculator.

Methods and findings: We collected data from 101,110 unique EMRs of living patients from January 1, 2009 to April 30, 2020. ML techniques were applied to patient datasets that included either only cross-sectional (CS) features, or CS combined with longitudinal (LT) features derived from vital statistics and laboratory values. We compared the utility of the models using a proposed new cost measure (Screened Cases Percentage @ Sensitivity level). All ML models tested achieved better predictive power than the PCE risk calculator. The random forest ML technique (RF) applied on the combination of CS and LT features (RF-LTC) produced the best area under curve (AUC) score of 0.902 (95% confidence interval (CI), 0.895-0.910). To detect 90% of all positive ASCVD cases, the best ML model required screening only 43% of patients, while the PCE risk calculator required screening 69% of patients.

Conclusions: Prediction models built using ML techniques improved ASCVD prediction and reduced the number of screenings required to predict ASCVD when compared with the PCE calculator, alone. Combining LT and CS features in the ML models significantly improved ASCVD prediction compared with using CS features, alone.

Keywords: Cardiovascular disease; Electronic health record; Machine learning; Mass screening; Risk.

MeSH terms

  • Atherosclerosis* / diagnosis
  • Cardiovascular Diseases* / diagnosis
  • Cardiovascular Diseases* / epidemiology
  • Cardiovascular Diseases* / prevention & control
  • Cross-Sectional Studies
  • Delivery of Health Care
  • Electronic Health Records
  • Humans
  • Machine Learning
  • Risk Assessment / methods
  • Risk Factors