Screening for prediabetes using machine learning models

Soo Beom Choi; Won Jae Kim; Tae Keun Yoo; Jee Soo Park; Jai Won Chung; Yong-ho Lee; Eun Seok Kang; Deok Won Kim

doi:10.1155/2014/618976

Screening for prediabetes using machine learning models

Comput Math Methods Med. 2014:2014:618976. doi: 10.1155/2014/618976. Epub 2014 Jul 16.

Authors

Soo Beom Choi¹, Won Jae Kim², Tae Keun Yoo³, Jee Soo Park³, Jai Won Chung⁴, Yong-ho Lee⁵, Eun Seok Kang⁵, Deok Won Kim⁴

Affiliations

¹ Department of Medical Engineering, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 120-752, Republic of Korea ; Brain Korea 21 PLUS Project for Medical Science, Yonsei University, Republic of Korea.
² Department of Medicine, Yonsei University College of Medicine, Republic of Korea.
³ Department of Medical Engineering, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 120-752, Republic of Korea ; Department of Medicine, Yonsei University College of Medicine, Republic of Korea.
⁴ Department of Medical Engineering, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 120-752, Republic of Korea ; Graduate Program in Biomedical Engineering, Yonsei University, Seoul, Republic of Korea.
⁵ Department of Internal Medicine, Yonsei University Health System, Republic of Korea.

Abstract

The global prevalence of diabetes is rapidly increasing. Studies support the necessity of screening and interventions for prediabetes, which could result in serious complications and diabetes. This study aimed at developing an intelligence-based screening model for prediabetes. Data from the Korean National Health and Nutrition Examination Survey (KNHANES) were used, excluding subjects with diabetes. The KNHANES 2010 data (n = 4685) were used for training and internal validation, while data from KNHANES 2011 (n = 4566) were used for external validation. We developed two models to screen for prediabetes using an artificial neural network (ANN) and support vector machine (SVM) and performed a systematic evaluation of the models using internal and external validation. We compared the performance of our models with that of a screening score model based on logistic regression analysis for prediabetes that had been developed previously. The SVM model showed the areas under the curve of 0.731 in the external datasets, which is higher than those of the ANN model (0.729) and the screening score model (0.712), respectively. The prescreening methods developed in this study performed better than the screening score model that had been developed previously and may be more effective method for prediabetes screening.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Area Under Curve
Humans
Male
Neural Networks, Computer*
Prediabetic State / diagnosis*
ROC Curve
Random Allocation
Republic of Korea
Risk Factors
Support Vector Machine*