Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach

Int J Med Inform. 2017 Dec;108:185-195. doi: 10.1016/j.ijmedinf.2017.10.002. Epub 2017 Oct 5.


Background: Mortality prediction of hospitalized patients is an important problem. Over the past few decades, several severity scoring systems and machine learning mortality prediction models have been developed for predicting hospital mortality. By contrast, early mortality prediction for intensive care unit patients remains an open challenge. Most research has focused on severity of illness scoring systems or data mining (DM) models designed for risk estimation at least 24 or 48h after ICU admission.

Objectives: This study highlights the main data challenges in early mortality prediction in ICU patients and introduces a new machine learning based framework for Early Mortality Prediction for Intensive Care Unit patients (EMPICU).

Materials and methods: The proposed method is evaluated on the Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database. Mortality prediction models are developed for patients at the age of 16 or above in Medical ICU (MICU), Surgical ICU (SICU) or Cardiac Surgery Recovery Unit (CSRU). We employ the ensemble learning Random Forest (RF), the predictive Decision Trees (DT), the probabilistic Naive Bayes (NB) and the rule-based Projective Adaptive Resonance Theory (PART) models. The primary outcome was hospital mortality. The explanatory variables included demographic, physiological, vital signs and laboratory test variables. Performance measures were calculated using cross-validated area under the receiver operating characteristic curve (AUROC) to minimize bias. 11,722 patients with single ICU stays are considered. Only patients at the age of 16 years old and above in Medical ICU (MICU), Surgical ICU (SICU) or Cardiac Surgery Recovery Unit (CSRU) are considered in this study.

Results: The proposed EMPICU framework outperformed standard scoring systems (SOFA, SAPS-I, APACHE-II, NEWS and qSOFA) in terms of AUROC and time (i.e. at 6h compared to 48h or more after admission).

Discussion and conclusion: The results show that although there are many values missing in the first few hour of ICU admission, there is enough signal to effectively predict mortality during the first 6h of admission. The proposed framework, in particular the one that uses the ensemble learning approach - EMPICU Random Forest (EMPICU-RF) offers a base to construct an effective and novel mortality prediction model in the early hours of an ICU patient admission, with an improved performance profile.

Keywords: Class imbalance; Classification; Intensive care; Mortality prediction; Random Forest.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Bayes Theorem
  • Databases, Factual
  • Female
  • Heart Diseases / mortality*
  • Heart Diseases / surgery
  • Hospital Mortality / trends*
  • Humans
  • Intensive Care Units / statistics & numerical data*
  • Machine Learning*
  • Male
  • Middle Aged
  • Outcome Assessment, Health Care*
  • ROC Curve
  • Severity of Illness Index*
  • Young Adult