Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults

Curr Med Sci. 2019 Aug;39(4):582-588. doi: 10.1007/s11596-019-2077-4. Epub 2019 Jul 25.

Abstract

Type 2 diabetes mellitus (T2DM) has become a prevalent health problem in China, especially in urban areas. Early prevention strategies are needed to reduce the associated mortality and morbidity. We applied the combination of rules and different machine learning techniques to assess the risk of development of T2DM in an urban Chinese adult population. A retrospective analysis was performed on 8000 people with non-diabetes and 3845 people with T2DM in Nanjing. Multilayer Perceptron (MLP), AdaBoost (AD), Trees Random Forest (TRF), Support Vector Machine (SVM), and Gradient Tree Boosting (GTB) machine learning techniques with 10 cross validation methods were used with the proposed model for the prediction of the risk of development of T2DM. The performance of these models was evaluated with accuracy, precision, sensitivity, specificity, and area under receiver operating characteristic (ROC) curve (AUC). After comparison, the prediction accuracy of the different five machine models was 0.87, 0.86, 0.86, 0.86 and 0.86 respectively. The combination model using the same voting weight of each component was built on T2DM, which was performed better than individual models. The findings indicate that, combining machine learning models could provide an accurate assessment model for T2DM risk prediction.

Keywords: machine learning; risk prediction; type 2 diabetes.

MeSH terms

  • Adult
  • China / epidemiology
  • Cross-Sectional Studies
  • Diabetes Mellitus, Type 2 / diagnosis
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Diabetes Mellitus, Type 2 / pathology
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Retrospective Studies
  • Risk Assessment*