Background: Patients with peripheral artery disease (PAD) are at risk of major adverse cardiac and cerebrovascular events. There are no readily available risk scores that can accurately identify which patients are most likely to sustain an event, making it difficult to identify those who might benefit from more aggressive intervention. Thus, we aimed to develop a novel predictive model-using machine learning methods on electronic health record data-to identify which PAD patients are most likely to develop major adverse cardiac and cerebrovascular events.
Methods and results: Data were derived from patients diagnosed with PAD at 2 tertiary care institutions. Predictive models were built using a common data model that allowed for utilization of both structured (coded) and unstructured (text) data. Only data from time of entry into the health system up to PAD diagnosis were used for modeling. Models were developed and tested using nested cross-validation. A total of 7686 patients were included in learning our predictive models. Utilizing almost 1000 variables, our best predictive model accurately determined which PAD patients would go on to develop major adverse cardiac and cerebrovascular events with an area under the curve of 0.81 (95% CI, 0.80-0.83).
Conclusions: Machine learning algorithms applied to data in the electronic health record can learn models that accurately identify PAD patients at risk of future major adverse cardiac and cerebrovascular events, highlighting the great potential of electronic health records to provide automated risk stratification for cardiovascular diseases. Common data models that can enable cross-institution research and technology development could potentially be an important aspect of widespread adoption of newer risk-stratification models.
Keywords: electronic health records; machine learning; mortality; peripheral arterial disease; risk.