Environmental chemical exposure dynamics and machine learning-based prediction of diabetes mellitus

Sci Total Environ. 2022 Feb 1;806(Pt 2):150674. doi: 10.1016/j.scitotenv.2021.150674. Epub 2021 Sep 29.

Abstract

Background: With dramatically increasing prevalence, diabetes mellitus has imposed a tremendous toll on individual well-being. Humans are exposed to various environmental chemicals, which have been postulated as underappreciated but potentially modifiable diabetes risk factors.

Objectives: To determine the utility of environmental chemical exposure in predicting diabetes mellitus.

Methods: A total of 8501 eligible participants from NHANES 2005-2016 were randomly assigned to a discovery (N = 5953) set and a validation (N = 2548) set. We applied random forest (RF) and least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation in the discovery set to select features, and built an optimal model to predict diabetes mellitus, blood insulin, fasting plasma glucose (FPG) and 2-h plasma glucose after oral glucose tolerance test (2-h PG after OGTT).

Results: The machine learning model using LASSO regression predicted diabetes with an area under the receiver operating characteristics (AUROC) of 0.80 and 0.78 in the discovery set and validation set, respectively. The linear model predicted blood insulin level with an R2 of 0.42 and 0.40 in the discovery set and validation set, respectively. For FPG, the discovery set and validation set yielded an R2 of 0.16 and 0.15, respectively. For 2-h PG after OGTT, the discovery set and validation set yielded an R2 of 0.18 and 0.17, respectively.

Conclusion: We used environmental chemical exposure, constructed machine learning models and achieved relatively accurate prediction for diabetes, emphasizing the predictive value of widespread environmental chemicals for complicated diseases.

Keywords: Diabetes; Environmental chemicals; Machine learning; Prediction model.

Publication types

  • Randomized Controlled Trial

MeSH terms

  • Diabetes Mellitus* / epidemiology
  • Fasting
  • Humans
  • Machine Learning
  • Nutrition Surveys
  • ROC Curve