Screening diabetes mellitus 2 based on electronic health records using temporal features

Health Informatics J. 2018 Jun;24(2):194-205. doi: 10.1177/1460458216663023. Epub 2016 Aug 26.


The prevalence of type 2 diabetes mellitus is increasing worldwide. Current methods of treating diabetes remain inadequate, and therefore, prevention with screening methods is the most appropriate process to reduce the burden of diabetes and its complications. We propose a new prognostic approach for type 2 diabetes mellitus based on electronic health records without using the current invasive techniques that are related to the disease (e.g. glucose level or glycated hemoglobin (HbA1c)). Our methodology is based on machine learning frameworks with data enrichment using temporal features. As as result our predictive model achieved an area under the receiver operating characteristics curve with a random forest classifier of 84.22 percent when including data information from 2009 to 2011 to predict diabetic patients in 2012, 83.19 percent when including temporal features, and 83.72 percent after applying temporal features and feature selection. We conclude that he pathology prediction is possible and efficient using the patient's progression information over the years and without using the invasive techniques that are currently used for type 2 diabetes mellitus classification.

Keywords: classification; database; diabetes mellitus 2; electronic health record; prognostic tool; screening.

MeSH terms

  • Area Under Curve
  • Blood Glucose / analysis
  • Diabetes Mellitus, Type 2 / classification
  • Diabetes Mellitus, Type 2 / diagnosis*
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Glycated Hemoglobin A / analysis
  • Humans
  • Male
  • Mass Screening / methods*
  • Mass Screening / standards
  • Middle Aged
  • Predictive Value of Tests
  • ROC Curve
  • Risk Factors


  • Blood Glucose
  • Glycated Hemoglobin A
  • hemoglobin A1c protein, human