Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost

J Transl Med. 2020 Dec 7;18(1):462. doi: 10.1186/s12967-020-02620-5.


Background: Sepsis is a significant cause of mortality in-hospital, especially in ICU patients. Early prediction of sepsis is essential, as prompt and appropriate treatment can improve survival outcomes. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression and scoring system. The aims of this study were to develop a machine learning approach using XGboost to predict the 30-days mortality for MIMIC-III Patients with sepsis-3 and to determine whether such model performs better than traditional prediction models.

Methods: Using the MIMIC-III v1.4, we identified patients with sepsis-3. The data was split into two groups based on death or survival within 30 days and variables, selected based on clinical significance and availability by stepwise analysis, were displayed and compared between groups. Three predictive models including conventional logistic regression model, SAPS-II score prediction model and XGBoost algorithm model were constructed by R software. Then, the performances of the three models were tested and compared by AUCs of the receiver operating characteristic curves and decision curve analysis. At last, nomogram and clinical impact curve were used to validate the model.

Results: A total of 4559 sepsis-3 patients are included in the study, in which, 889 patients were death and 3670 survival within 30 days, respectively. According to the results of AUCs (0.819 [95% CI 0.800-0.838], 0.797 [95% CI 0.781-0.813] and 0.857 [95% CI 0.839-0.876]) and decision curve analysis for the three models, the XGboost model performs best. The risk nomogram and clinical impact curve verify that the XGboost model possesses significant predictive value.

Conclusions: Using machine learning technique by XGboost, more significant prediction model can be built. This XGboost model may prove clinically useful and assist clinicians in tailoring precise management and therapy for the patients with sepsis-3.

Keywords: Logistic regression; MIMIC-III; Machine learning; SAPS-II score; Sepsis-3; Xgboost.

MeSH terms

  • Hospital Mortality
  • Humans
  • Logistic Models
  • Machine Learning*
  • ROC Curve
  • Sepsis* / diagnosis