Development and validation of an insulin resistance model for a population without diabetes mellitus and its clinical implication: a prospective cohort study

EClinicalMedicine. 2023 Apr 4:58:101934. doi: 10.1016/j.eclinm.2023.101934. eCollection 2023 Apr.

Abstract

Background: Insulin resistance (IR) is associated with diabetes mellitus, cardiovascular disease (CV), and mortality. Few studies have used machine learning to predict IR in the non-diabetic population.

Methods: In this prospective cohort study, we trained a predictive model for IR in the non-diabetic populations using the US National Health and Nutrition Examination Survey (NHANES, from JAN 01, 1999 to DEC 31, 2012) database and the Taiwan MAJOR (from JAN 01, 2008 to DEC 31, 2017) database. We analysed participants in the NHANES and MAJOR and participants were excluded if they were aged <18 years old, had incomplete laboratory data, or had DM. To investigate the clinical implications (CV and all-cause mortality) of this trained model, we tested it with the Taiwan biobank (TWB) database from DEC 10, 2008 to NOV 30, 2018. We then used SHapley Additive exPlanation (SHAP) values to explain differences across the machine learning models.

Findings: Of all participants (combined NHANES and MJ databases), we randomly selected 14,705 participants for the training group, and 4018 participants for the validation group. In the validation group, their areas under the curve (AUC) were all >0.8 (highest being XGboost, 0.87). In the test group, all AUC were also >0.80 (highest being XGboost, 0.88). Among all 9 features (age, gender, race, body mass index, fasting plasma glucose (FPG), glycohemoglobin, triglyceride, total cholesterol and high-density cholesterol), BMI had the highest value of feature importance on IR (0.43 for XGboost and 0.47 for RF algorithms). All participants from the TWB database were separated into the IR group and the non-IR group according to the XGboost algorithm. The Kaplan-Meier survival curve showed a significant difference between the IR and non-IR groups (p < 0.0001 for CV mortality, and p = 0.0006 for all-cause mortality). Therefore, the XGboost model has clear clinical implications for predicting IR, aside from CV and all-cause mortality.

Interpretation: To predict IR in non-diabetic patients with high accuracy, only 9 easily obtained features are needed for prediction accuracy using our machine learning model. Similarly, the model predicts IR patients with significantly higher CV and all-cause mortality. The model can be applied to both Asian and Caucasian populations in clinical practice.

Funding: Taichung Veterans General Hospital, Taiwan and Japan Society for the Promotion of Science KAKENHI Grant Number JP21KK0293.

Keywords: Insulin resistance; MAJOR (MJ) research database; Machine learning; National health and nutrition examination survey (NHANES); Taiwan biobank (TWB).