Identifying Predictors Associated with Risk of Death or Admission to Intensive Care Unit in Internal Medicine Patients with Sepsis: A Comparison of Statistical Models and Machine Learning Algorithms

Antibiotics (Basel). 2023 May 18;12(5):925. doi: 10.3390/antibiotics12050925.

Abstract

Background: Sepsis is a time-dependent disease: the early recognition of patients at risk for poor outcome is mandatory. Aim: To identify prognostic predictors of the risk of death or admission to intensive care units in a consecutive sample of septic patients, comparing different statistical models and machine learning algorithms. Methods: Retrospective study including 148 patients discharged from an Italian internal medicine unit with a diagnosis of sepsis/septic shock and microbiological identification. Results: Of the total, 37 (25.0%) patients reached the composite outcome. The sequential organ failure assessment (SOFA) score at admission (odds ratio (OR): 1.83; 95% confidence interval (CI): 1.41-2.39; p < 0.001), delta SOFA (OR: 1.64; 95% CI: 1.28-2.10; p < 0.001), and the alert, verbal, pain, unresponsive (AVPU) status (OR: 5.96; 95% CI: 2.13-16.67; p < 0.001) were identified through the multivariable logistic model as independent predictors of the composite outcome. The area under the receiver operating characteristic curve (AUC) was 0.894; 95% CI: 0.840-0.948. In addition, different statistical models and machine learning algorithms identified further predictive variables: delta quick-SOFA, delta-procalcitonin, mortality in emergency department sepsis, mean arterial pressure, and the Glasgow Coma Scale. The cross-validated multivariable logistic model with the least absolute shrinkage and selection operator (LASSO) penalty identified 5 predictors; and recursive partitioning and regression tree (RPART) identified 4 predictors with higher AUC (0.915 and 0.917, respectively); the random forest (RF) approach, including all evaluated variables, obtained the highest AUC (0.978). All models' results were well calibrated. Conclusions: Although structurally different, each model identified similar predictive covariates. The classical multivariable logistic regression model was the most parsimonious and calibrated one, while RPART was the easiest to interpret clinically. Finally, LASSO and RF were the costliest in terms of number of variables identified.

Keywords: SOFA; internal medicine; machine learning; prognostication; sepsis.

Grants and funding

This research received no external funding.