Development and Validation of Machine Learning Models for Prediction of 1-Year Mortality Utilizing Electronic Medical Record Data Available at the End of Hospitalization in Multicondition Patients: a Proof-of-Concept Study

J Gen Intern Med. 2018 Jun;33(6):921-928. doi: 10.1007/s11606-018-4316-y. Epub 2018 Jan 30.


Background: Predicting death in a cohort of clinically diverse, multicondition hospitalized patients is difficult. Prognostic models that use electronic medical record (EMR) data to determine 1-year death risk can improve end-of-life planning and risk adjustment for research.

Objective: Determine if the final set of demographic, vital sign, and laboratory data from a hospitalization can be used to accurately quantify 1-year mortality risk.

Design: A retrospective study using electronic medical record data linked with the state death registry.

Participants: A total of 59,848 hospitalized patients within a six-hospital network over a 4-year period.

Main measures: The last set of vital signs, complete blood count, basic and complete metabolic panel, demographic information, and ICD codes. The outcome of interest was death within 1 year.

Key results: Model performance was measured on the validation data set. Random forests (RF) outperformed logisitic regression (LR) models in discriminative ability. An RF model that used the final set of demographic, vitals, and laboratory data from the final 48 h of hospitalization had an AUC of 0.86 (0.85-0.87) for predicting death within a year. Age, blood urea nitrogen, platelet count, hemoglobin, and creatinine were the most important variables in the RF model. Models that used comorbidity variables alone had the lowest AUC. In groups of patients with a high probability of death, RF models underestimated the probability by less than 10%.

Conclusion: The last set of EMR data from a hospitalization can be used to accurately estimate the risk of 1-year mortality within a cohort of multicondition hospitalized patients.

Keywords: data mining; hospital outcomes; machine learning; predictive models.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Cohort Studies
  • Data Analysis
  • Electronic Health Records / standards*
  • Electronic Health Records / trends
  • Female
  • Forecasting
  • Hospitalization* / trends
  • Humans
  • Machine Learning / standards*
  • Machine Learning / trends
  • Male
  • Middle Aged
  • Models, Theoretical*
  • Mortality* / trends
  • Proof of Concept Study*
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Factors