Background: Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration.
Objectives: To compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data.
Research design: Retrospective cohort study. We identified all Veterans Health Administration patients without recent CCV events treated at 12 facilities from 2003 to 2007, and predicted risk using the Framingham risk score, logistic regression, generalized additive modeling, and gradient tree boosting.
Measures: The outcome was CCV-related death within 5 years. We assessed each method's predictive performance with the area under the receiver operating characteristic curve (AUC), the Hosmer-Lemeshow goodness-of-fit test, plots of estimated risk, and reclassification tables, using cross-validation to penalize overfitting.
Results: Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). Even better performance was attained in models using additional EHR-derived predictor variables (AUC increased to 78% and net reclassification improvement was as large as 0.29). Nonparametric regression further improved calibration and discrimination compared with logistic regression.
Conclusions: Despite the EHR lacking some risk factors and its imperfect data quality, health care systems may be able to substantially improve risk prediction for their patients by using internally developed EHR-derived models and flexible statistical methodology.