Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes
- PMID: 20307319
- PMCID: PMC2850872
- DOI: 10.1186/1472-6947-10-16
Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes
Abstract
Background: We present a potentially useful alternative approach based on support vector machine (SVM) techniques to classify persons with and without common diseases. We illustrate the method to detect persons with diabetes and pre-diabetes in a cross-sectional representative sample of the U.S. population.
Methods: We used data from the 1999-2004 National Health and Nutrition Examination Survey (NHANES) to develop and validate SVM models for two classification schemes: Classification Scheme I (diagnosed or undiagnosed diabetes vs. pre-diabetes or no diabetes) and Classification Scheme II (undiagnosed diabetes or pre-diabetes vs. no diabetes). The SVM models were used to select sets of variables that would yield the best classification of individuals into these diabetes categories.
Results: For Classification Scheme I, the set of diabetes-related variables with the best classification performance included family history, age, race and ethnicity, weight, height, waist circumference, body mass index (BMI), and hypertension. For Classification Scheme II, two additional variables--sex and physical activity--were included. The discriminative abilities of the SVM models for Classification Schemes I and II, according to the area under the receiver operating characteristic (ROC) curve, were 83.5% and 73.2%, respectively. The web-based tool-Diabetes Classifier was developed to demonstrate a user-friendly application that allows for individual or group assessment with a configurable, user-defined threshold.
Conclusions: Support vector machine modeling is a promising classification approach for detecting persons with common diseases such as diabetes and pre-diabetes in the population. This approach should be further explored in other complex diseases using common variables.
Figures
Similar articles
-
Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes.Diabetes Care. 2008 May;31(5):1040-5. doi: 10.2337/dc07-1150. Epub 2007 Dec 10. Diabetes Care. 2008. PMID: 18070993
-
A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5. BMC Med Inform Decis Mak. 2019. PMID: 31694707 Free PMC article.
-
Development of a clinical guideline to predict undiagnosed diabetes in dental patients.J Am Dent Assoc. 2011 Jan;142(1):28-37. doi: 10.14219/jada.archive.2011.0025. J Am Dent Assoc. 2011. PMID: 21193764
-
Support vector machine applications in bioinformatics.Appl Bioinformatics. 2003;2(2):67-77. Appl Bioinformatics. 2003. PMID: 15130823 Review.
-
Risk assessment tools for detecting those with pre-diabetes: a systematic review.Diabetes Res Clin Pract. 2014 Jul;105(1):1-13. doi: 10.1016/j.diabres.2014.03.007. Epub 2014 Mar 18. Diabetes Res Clin Pract. 2014. PMID: 24694663 Review.
Cited by
-
Diabetes prediction model for unbalanced community follow-up data set based on optimal feature selection and scorecard.Digit Health. 2024 Feb 29;10:20552076241236370. doi: 10.1177/20552076241236370. eCollection 2024 Jan-Dec. Digit Health. 2024. PMID: 38449681 Free PMC article.
-
Assessing risk factors for malnutrition among women in Bangladesh and forecasting malnutrition using machine learning approaches.BMC Nutr. 2024 Feb 1;10(1):22. doi: 10.1186/s40795-023-00808-8. BMC Nutr. 2024. PMID: 38303093 Free PMC article.
-
Secure and privacy-preserving automated machine learning operations into end-to-end integrated IoT-edge-artificial intelligence-blockchain monitoring system for diabetes mellitus prediction.Comput Struct Biotechnol J. 2023 Nov 23;23:212-233. doi: 10.1016/j.csbj.2023.11.038. eCollection 2024 Dec. Comput Struct Biotechnol J. 2023. PMID: 38169966 Free PMC article.
-
Combining data discretization and missing value imputation for incomplete medical datasets.PLoS One. 2023 Nov 30;18(11):e0295032. doi: 10.1371/journal.pone.0295032. eCollection 2023. PLoS One. 2023. PMID: 38033140 Free PMC article.
-
Machine learning risk estimation and prediction of death in continuing care facilities using administrative data.Sci Rep. 2023 Oct 18;13(1):17708. doi: 10.1038/s41598-023-43943-9. Sci Rep. 2023. PMID: 37853045 Free PMC article.
References
-
- Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995;20:273–297.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
