Data mining process for predicting diabetes mellitus based model about other chronic diseases: a case study of the northwestern part of Nigeria

Healthc Technol Lett. 2019 Jul 9;6(4):98-102. doi: 10.1049/htl.2018.5111. eCollection 2019 Aug.

Abstract

To predict diabetes mellitus model data mining (DM) based approaches on the dataset collected from the seven northwestern states of Nigeria. Data were collected from both primary and secondary sources through questionnaires and verbal interviews from patients with diabetic mellitus and other chronic diseases. Some hospital data were also used from the records of patients involved in this work. The dataset comprises 281 instances with 8 attributes. R programming software (version 5.3.1) was used in the experiments. The DM techniques used in this research were binomial logistic regression, classification, confusion matrix and correlation coefficient. The data were partitioned into training and testing sets. Training data were used in building the model while testing data were used to validate the model. The algorithm for the best-fitted model converges with null deviance: 281.951, residual deviance: 16.476 and AIC: 30.476. The significance variables are AGE, GLU, DBP and KDYP with 0.025, 0.01, 0.05 and 0.025 P values, respectively. The predicted model accounted for the accuracy of ∼97.1%. The correlation analysis results revealed that diabetic patients are more likely to be hypertensive than patients with other chronic diseases considered in the research.

Keywords: DM techniques; Nigeria; best-fitted model converges; biomedical measurement; chronic diseases; confusion matrix; correlation coefficient; data mining; data mining process; dataset; diabetes mellitus based model; diabetes mellitus model data mining based approaches; diabetic mellitus; diabetic patients; diseases; hospital data; medical computing; medical diagnostic computing; medical disorders; northwestern part; patient diagnosis; patient treatment; pattern classification; predicted model; primary sources; regression analysis; secondary sources; seven northwestern states; testing sets; training data.