Developing and validating machine learning-based prediction models for frailty occurrence in those with chronic obstructive pulmonary disease

Yong Chen; Yonglin Yu; Dongmei Yang; Wenbo Zhang; Vasileios Kouritas; Xiaoju Chen

doi:10.21037/jtd-24-416

Developing and validating machine learning-based prediction models for frailty occurrence in those with chronic obstructive pulmonary disease

J Thorac Dis. 2024 Apr 30;16(4):2482-2498. doi: 10.21037/jtd-24-416. Epub 2024 Apr 29.

Authors

Yong Chen¹, Yonglin Yu², Dongmei Yang¹, Wenbo Zhang¹, Vasileios Kouritas³, Xiaoju Chen⁴

Affiliations

¹ Department of Respiratory and Critical Care Medicine, The Affiliated Hospital of North Sichuan Medical College, Nanchong, China.
² Department of Stomatology, The Affiliated Hospital of North Sichuan Medical College, Nanchong, China.
³ Department of Thoracic Surgery, Norfolk and Norwich University Hospital, Norwich, UK.
⁴ Department of Respiratory and Critical Care Medicine, Clinical Medical College & Affiliated Hospital of Chengdu University, Chengdu, China.

Abstract

Background: Frailty is a medical syndrome caused by multiple factors, characterized by decreased strength, endurance, and diminished physiological function, resulting in increased susceptibility to dependence and/or death. Patients with chronic obstructive pulmonary disease (COPD) tend to be more vulnerable to frailty due to their physical and psychological burdens. Therefore, the aim of this study was to develop a reliable and accurate vulnerability risk prediction model for frailty in patients with COPD in order to improve the identification and prediction of patient frailty. The specific objectives of this study were to determine the prevalence of frailty in patients with COPD and develop a prediction model and evaluate its predictive power.

Methods: Clinical information was analyzed using data from the 2018 China Health and Retirement Longitudinal Study (CHARLS) database, and 34 indicators, including behavioral factors, health status, mental health parameters, and various sociodemographic variables, were examined in the study. The adaptive synthetic sampling technique was used for unbalanced data. Three methods, ridge regressor, extreme gradient boosting (XGBoost) classifier, and random forest (RF) regressor, were used to filter predictors. Seven machine learning (ML) techniques including logistic regression (LR), support vector machines (SVM), multilayer perceptron, light gradient-boosting machine, XGBoost, RF, and K-nearest neighbors were used to analyze and determine the optimal model. For customized risk assessment, an online predictive risk modeling website was created, along with Shapley additive explanation (SHAP) interpretations.

Results: Depression, smoking, gender, social activities, dyslipidemia, asthma, and residence type (urban vs. rural) were predictors for the development of frailty in patients with COPD. In the test set, the XGBoost model had an area under the curve of 0.942 (95% confidence interval: 0.925-0.959), an accuracy of 0.915, a sensitivity of 0.873, and a specificity of 0.911, indicating that it was the best model.

Conclusions: The ML predictive model developed in this study is a useful and easy-to-use instrument for assessing the vulnerability risk of patients with COPD and may aid clinical physicians in screening high-risk patients.

Keywords: Chronic obstructive pulmonary disease (COPD); Shapley additive explanation (SHAP); frailty; machine learning (ML); prediction model.