Prediction of mortality risk of health checkup participants using machine learning-based models: the J-SHC study

Sci Rep. 2022 Aug 19;12(1):14154. doi: 10.1038/s41598-022-18276-8.

Abstract

Early detection and treatment of diseases through health checkups are effective in improving life expectancy. In this study, we compared the predictive ability for 5-year mortality between two machine learning-based models (gradient boosting decision tree [XGBoost] and neural network) and a conventional logistic regression model in 116,749 health checkup participants. We built prediction models using a training dataset consisting of 85,361 participants in 2008 and evaluated the models using a test dataset consisting of 31,388 participants from 2009 to 2014. The predictive ability was evaluated by the values of the area under the receiver operating characteristic curve (AUC) in the test dataset. The AUC values were 0.811 for XGBoost, 0.774 for neural network, and 0.772 for logistic regression models, indicating that the predictive ability of XGBoost was the highest. The importance rating of each explanatory variable was evaluated using the SHapley Additive exPlanations (SHAP) values, which were similar among these models. This study showed that the machine learning-based model has a higher predictive ability than the conventional logistic regression model and may be useful for risk assessment and health guidance for health checkup participants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Logistic Models
  • Machine Learning*
  • Neural Networks, Computer*
  • ROC Curve
  • Risk Assessment