Purpose: This study aims to develop and validate a machine learning (ML) model to predict prolonged hospitalization in asthma patients.
Patients and methods: This retrospective cohort study included patients with asthma as the primary diagnosis. We randomly divided 2820 asthma patients from Beth Israel Deaconess Medical Center into a training set and an internal validation set (in an 8:2 ratio), and used 1714 asthma patients from 208 other hospitals in the United States as an external validation cohort. Prolonged hospitalization was the primary outcome. Feature selection was conducted using LASSO regression, univariate logistic regression, and multivariate logistic regression analyses. Nine ML algorithms were employed to develop predictive models.
Results: Based on discrimination, calibration, and clinical utility, the Extreme Gradient Boosting (XGBoost) model demonstrated the best overall performance. The nine most important predictors in the model were age, oxygen saturation (SpO2), red blood cell count, hemoglobin count, comorbid pneumonia, chronic obstructive pulmonary disease (COPD), congestive heart failure, anxiety, and use of invasive mechanical ventilation. The XGBoost model achieved an area under the receiver operating characteristic curve (AUC) of 0.829 and a Cohen's Kappa value of 0.439 in the internal validation set, and an AUC of 0.745 and a Cohen's Kappa value of 0.315 in the external validation set. The decision curve analysis indicated good clinical utility of the model.
Conclusions: The XGBoost model can effectively predict prolonged hospitalization in asthma patients.
Keywords: Prolonged hospitalization; XGBoost; anxiety; comorbid pneumonia; length of stay; machine learning; prediction model.