Objective: Incorporating accurate life expectancy predictions into clinical decision making could improve quality and decrease costs, but few providers do this. We sought to use predictive data mining and high dimensional analytics of electronic health record (EHR) data to develop a highly accurate and clinically actionable 5 year life expectancy index.
Materials and methods: We developed the index using EHR data for 7463 patients ≥50 years old with ≥1 visit(s) in 2003 to a large, academic, multispecialty group practice. We extracted 980 attributes from the EHRs of the practices and affiliated hospitals. Correlation feature selection with greedy stepwise search was used to find the attribute subset with best average merit. Rotation forest ensembling with alternating decision tree as underlying classifier was used to predict 5 year mortality. Model performance was compared with the modified Charlson Comorbidity Index and the Walter life expectancy method.
Results: Within 5 years of the last visit in 2003, 838 (11%) patients had died. The final model included 24 attributes: two demographic (age, sex), 10 comorbidity (eg, cardiovascular disease), one vital sign (mean diastolic blood pressure), two medications (loop diuretic use, digoxin use), six laboratory (eg, mean albumin), and three healthcare utilization (eg, the number of hospitalizations 1 year prior to the last visit in 2003). The index showed very good discrimination (c-statistic 0.86) and outperformed comparators.
Conclusions: The EHR based index successfully distinguished adults ≥50 years old with life expectancy >5 years from those with life expectancy ≤5 years. This information could be used clinically to optimize preventive service use (eg, cancer screening in the elderly).