An interpretable machine learning model for predicting 28-day mortality in patients with sepsis-associated liver injury

PLoS One. 2024 May 20;19(5):e0303469. doi: 10.1371/journal.pone.0303469. eCollection 2024.

Abstract

Sepsis-Associated Liver Injury (SALI) is an independent risk factor for death from sepsis. The aim of this study was to develop an interpretable machine learning model for early prediction of 28-day mortality in patients with SALI. Data from the Medical Information Mart for Intensive Care (MIMIC-IV, v2.2, MIMIC-III, v1.4) were used in this study. The study cohort from MIMIC-IV was randomized to the training set (0.7) and the internal validation set (0.3), with MIMIC-III (2001 to 2008) as external validation. The features with more than 20% missing values were deleted and the remaining features were multiple interpolated. Lasso-CV that lasso linear model with iterative fitting along a regularization path in which the best model is selected by cross-validation was used to select important features for model development. Eight machine learning models including Random Forest (RF), Logistic Regression, Decision Tree, Extreme Gradient Boost (XGBoost), K Nearest Neighbor, Support Vector Machine, Generalized Linear Models in which the best model is selected by cross-validation (CV_glmnet), and Linear Discriminant Analysis (LDA) were developed. Shapley additive interpretation (SHAP) was used to improve the interpretability of the optimal model. At last, a total of 1043 patients were included, of whom 710 were from MIMIC-IV and 333 from MIMIC-III. Twenty-four clinically relevant parameters were selected for model construction. For the prediction of 28-day mortality of SALI in the internal validation set, the area under the curve (AUC (95% CI)) of RF was 0.79 (95% CI: 0.73-0.86), and which performed the best. Compared with the traditional disease severity scores including Oxford Acute Severity of Illness Score (OASIS), Sequential Organ Failure Assessment (SOFA), Simplified Acute Physiology Score II (SAPS II), Logistic Organ Dysfunction Score (LODS), Systemic Inflammatory Response Syndrome (SIRS), and Acute Physiology Score III (APS III), RF also had the best performance. SHAP analysis found that Urine output, Charlson Comorbidity Index (CCI), minimal Glasgow Coma Scale (GCS_min), blood urea nitrogen (BUN) and admission_age were the five most important features affecting RF model. Therefore, RF has good predictive ability for 28-day mortality prediction in SALI. Urine output, CCI, GCS_min, BUN and age at admission(admission_age) within 24 h after intensive care unit(ICU) admission contribute significantly to model prediction.

MeSH terms

  • Aged
  • Female
  • Humans
  • Liver Diseases / mortality
  • Machine Learning*
  • Male
  • Middle Aged
  • Prognosis
  • Risk Factors
  • Sepsis* / mortality

Grants and funding

This study was supported by Sichuan Science and Technology Program (2022YFS0626), Southwest Medical University (2022QN073), and Sichuan Science Technology Innovation Seedling Project (MZGC20230040) and Southwest Medical University and Xuyong County People's Hospital (2023XYXNYD16). Funding related to 2022YFS062, 2022QN073, MZGC20230040, and 2023XYXNYD16 was received by Chengli Wen, who wrote the original draft, completed data curation, and writing – review & editing the manuscript. Funding related to 2022YFS063 and 2021(451) were received by Xianying Lei, who completed the part of conceptualization with Tao Xu and Sicheng Liang; 2022NSFSC0576 was received by Sicheng Liang, who assisted Lei Xianying to complete Conceptualization; 2021SNXNYD05 was received by Muhan Lü, who supervised the progress and quality of the research and writing – review & editing the manuscript.