Developing and validating machine learning-based prediction models for frailty occurrence in those with chronic obstructive pulmonary disease

J Thorac Dis. 2024 Apr 30;16(4):2482-2498. doi: 10.21037/jtd-24-416. Epub 2024 Apr 29.

Abstract

Background: Frailty is a medical syndrome caused by multiple factors, characterized by decreased strength, endurance, and diminished physiological function, resulting in increased susceptibility to dependence and/or death. Patients with chronic obstructive pulmonary disease (COPD) tend to be more vulnerable to frailty due to their physical and psychological burdens. Therefore, the aim of this study was to develop a reliable and accurate vulnerability risk prediction model for frailty in patients with COPD in order to improve the identification and prediction of patient frailty. The specific objectives of this study were to determine the prevalence of frailty in patients with COPD and develop a prediction model and evaluate its predictive power.

Methods: Clinical information was analyzed using data from the 2018 China Health and Retirement Longitudinal Study (CHARLS) database, and 34 indicators, including behavioral factors, health status, mental health parameters, and various sociodemographic variables, were examined in the study. The adaptive synthetic sampling technique was used for unbalanced data. Three methods, ridge regressor, extreme gradient boosting (XGBoost) classifier, and random forest (RF) regressor, were used to filter predictors. Seven machine learning (ML) techniques including logistic regression (LR), support vector machines (SVM), multilayer perceptron, light gradient-boosting machine, XGBoost, RF, and K-nearest neighbors were used to analyze and determine the optimal model. For customized risk assessment, an online predictive risk modeling website was created, along with Shapley additive explanation (SHAP) interpretations.

Results: Depression, smoking, gender, social activities, dyslipidemia, asthma, and residence type (urban vs. rural) were predictors for the development of frailty in patients with COPD. In the test set, the XGBoost model had an area under the curve of 0.942 (95% confidence interval: 0.925-0.959), an accuracy of 0.915, a sensitivity of 0.873, and a specificity of 0.911, indicating that it was the best model.

Conclusions: The ML predictive model developed in this study is a useful and easy-to-use instrument for assessing the vulnerability risk of patients with COPD and may aid clinical physicians in screening high-risk patients.

Keywords: Chronic obstructive pulmonary disease (COPD); Shapley additive explanation (SHAP); frailty; machine learning (ML); prediction model.