Predicting Future Cardiovascular Events in Patients With Peripheral Artery Disease Using Electronic Health Record Data

Circ Cardiovasc Qual Outcomes. 2019 Mar;12(3):e004741. doi: 10.1161/CIRCOUTCOMES.118.004741.


Background: Patients with peripheral artery disease (PAD) are at risk of major adverse cardiac and cerebrovascular events. There are no readily available risk scores that can accurately identify which patients are most likely to sustain an event, making it difficult to identify those who might benefit from more aggressive intervention. Thus, we aimed to develop a novel predictive model-using machine learning methods on electronic health record data-to identify which PAD patients are most likely to develop major adverse cardiac and cerebrovascular events.

Methods and results: Data were derived from patients diagnosed with PAD at 2 tertiary care institutions. Predictive models were built using a common data model that allowed for utilization of both structured (coded) and unstructured (text) data. Only data from time of entry into the health system up to PAD diagnosis were used for modeling. Models were developed and tested using nested cross-validation. A total of 7686 patients were included in learning our predictive models. Utilizing almost 1000 variables, our best predictive model accurately determined which PAD patients would go on to develop major adverse cardiac and cerebrovascular events with an area under the curve of 0.81 (95% CI, 0.80-0.83).

Conclusions: Machine learning algorithms applied to data in the electronic health record can learn models that accurately identify PAD patients at risk of future major adverse cardiac and cerebrovascular events, highlighting the great potential of electronic health records to provide automated risk stratification for cardiovascular diseases. Common data models that can enable cross-institution research and technology development could potentially be an important aspect of widespread adoption of newer risk-stratification models.

Keywords: electronic health records; machine learning; mortality; peripheral arterial disease; risk.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Aged, 80 and over
  • Cerebrovascular Disorders / diagnosis
  • Cerebrovascular Disorders / epidemiology*
  • Data Mining*
  • Electronic Health Records*
  • Female
  • Heart Diseases / diagnosis
  • Heart Diseases / epidemiology*
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Peripheral Arterial Disease / diagnosis
  • Peripheral Arterial Disease / epidemiology*
  • Prognosis
  • Risk Assessment
  • Risk Factors
  • Tertiary Care Centers
  • Time Factors
  • United States / epidemiology