Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Observational Study
. 2016 Feb;44(2):368-74.
doi: 10.1097/CCM.0000000000001571.

Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards

Free PMC article
Observational Study

Multicenter Comparison of Machine Learning Methods and Conventional Regression for Predicting Clinical Deterioration on the Wards

Matthew M Churpek et al. Crit Care Med. .
Free PMC article


Objective: Machine learning methods are flexible prediction algorithms that may be more accurate than conventional regression. We compared the accuracy of different techniques for detecting clinical deterioration on the wards in a large, multicenter database.

Design: Observational cohort study.

Setting: Five hospitals, from November 2008 until January 2013.

Patients: Hospitalized ward patients

Interventions: None

Measurements and main results: Demographic variables, laboratory values, and vital signs were utilized in a discrete-time survival analysis framework to predict the combined outcome of cardiac arrest, intensive care unit transfer, or death. Two logistic regression models (one using linear predictor terms and a second utilizing restricted cubic splines) were compared to several different machine learning methods. The models were derived in the first 60% of the data by date and then validated in the next 40%. For model derivation, each event time window was matched to a non-event window. All models were compared to each other and to the Modified Early Warning score, a commonly cited early warning score, using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patients were admitted, and 424 cardiac arrests, 13,188 intensive care unit transfers, and 2,840 deaths occurred in the study. In the validation dataset, the random forest model was the most accurate model (AUC, 0.80 [95% CI, 0.80-0.80]). The logistic regression model with spline predictors was more accurate than the model utilizing linear predictors (AUC, 0.77 vs 0.74; p < 0.01), and all models were more accurate than the MEWS (AUC, 0.70 [95% CI, 0.70-0.70]).

Conclusions: In this multicenter study, we found that several machine learning methods more accurately predicted clinical deterioration than logistic regression. Use of detection algorithms derived from these techniques may result in improved identification of critically ill patients on the wards.


Figure 1
Figure 1. Area under the receiver operator characteristic curves of the compared methods for the composite outcome in the validation cohort*
*Error bars indicate the upper 95% confidence intervals. Abbreviations: MEWS: Modified Early Warning Score, AUC: Area under the receiver operating characteristic curve
Figure 2
Figure 2. Graph illustrating model sensitivity by the percent of observations above a score threshold (i.e. positive screen) for the Modified Early Warning Score, logistic regression models, and random forest model in the validation cohort
Abbreviations: MEWS: Modified Early Warning Score
Figure 3
Figure 3. Importance of the predictor variables in the random forest model, scaled to a maximum of 100
Abbreviations: BUN: blood urea nitrogen; AVPU: alert, responsive to voice, responsive to pain, unresponsive; SGOT: serum glutamic oxaloacetic transaminase; ICU: intensive care unit
Figure 4
Figure 4
Partial plot of the effect of respiratory rate (A), heart rate (B), age (C), and systolic blood pressure (D) on the risk of the composite outcome across different values in the random forest model.

Similar articles

See all similar articles

Cited by 64 articles

See all "Cited by" articles

Publication types