Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction

Juan Zhao; QiPing Feng; Patrick Wu; Roxana A Lupu; Russell A Wilke; Quinn S Wells; Joshua C Denny; Wei-Qi Wei

doi:10.1038/s41598-018-36745-x

Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction

Sci Rep. 2019 Jan 24;9(1):717. doi: 10.1038/s41598-018-36745-x.

Authors

Juan Zhao¹, QiPing Feng², Patrick Wu^{1

3}, Roxana A Lupu⁴, Russell A Wilke⁴, Quinn S Wells⁵, Joshua C Denny^{1

5}, Wei-Qi Wei⁶

Affiliations

¹ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
² Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Medical Scientist Training Program, Vanderbilt University School of Medicine, Nashville, TN, USA.
⁴ Department of Medicine, University of South Dakota Sanford School of Medicine, Sioux Falls, SD, USA.
⁵ Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
⁶ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. wei-qi.wei@vumc.org.

Abstract

Current approaches to predicting a cardiovascular disease (CVD) event rely on conventional risk factors and cross-sectional data. In this study, we applied machine learning and deep learning models to 10-year CVD event prediction by using longitudinal electronic health record (EHR) and genetic data. Our study cohort included 109, 490 individuals. In the first experiment, we extracted aggregated and longitudinal features from EHR. We applied logistic regression, random forests, gradient boosting trees, convolutional neural networks (CNN) and recurrent neural networks with long short-term memory (LSTM) units. In the second experiment, we applied a late-fusion approach to incorporate genetic features. We compared the performance with approaches currently utilized in routine clinical practice - American College of Cardiology and the American Heart Association (ACC/AHA) Pooled Cohort Risk Equation. Our results indicated that incorporating longitudinal feature lead to better event prediction. Combining genetic features through a late-fusion approach can further improve CVD prediction, underscoring the importance of integrating relevant genetic data whenever available.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms
Cardiovascular Diseases / diagnosis*
Cardiovascular Diseases / epidemiology
Cardiovascular Diseases / etiology
Case-Control Studies
Cross-Sectional Studies
Deep Learning*
Electronic Health Records / statistics & numerical data*
Female
Genetic Variation*
Humans
Longitudinal Studies
Machine Learning*
Male
Neural Networks, Computer
Risk Factors
United States / epidemiology

Abstract

Publication types

MeSH terms

Grants and funding