Validating an ontology-based algorithm to identify patients with type 2 diabetes mellitus in electronic health records

Int J Med Inform. 2014 Oct;83(10):768-78. doi: 10.1016/j.ijmedinf.2014.06.002. Epub 2014 Jun 20.


Background: Improving healthcare for people with chronic conditions requires clinical information systems that support integrated care and information exchange, emphasizing a semantic approach to support multiple and disparate Electronic Health Records (EHRs). Using a literature review, the Australian National Guidelines for Type 2 Diabetes Mellitus (T2DM), SNOMED-CT-AU and input from health professionals, we developed a Diabetes Mellitus Ontology (DMO) to diagnose and manage patients with diabetes. This paper describes the manual validation of the DMO-based approach using real world EHR data from a general practice (n=908 active patients) participating in the electronic Practice Based Research Network (ePBRN).

Method: The DMO-based algorithm to query, using Semantic Protocol and RDF Query Language (SPARQL), the structured fields in the ePBRN data repository were iteratively tested and refined. The accuracy of the final DMO-based algorithm was validated with a manual audit of the general practice EHR. Contingency tables were prepared and Sensitivity and Specificity (accuracy) of the algorithm to diagnose T2DM measured, using the T2DM cases found by manual EHR audit as the gold standard. Accuracy was determined with three attributes - reason for visit (RFV), medication (Rx) and pathology (path) - singly and in combination.

Results: The Sensitivity and Specificity of the algorithm were 100% and 99.88% with RFV; 96.55% and 98.97% with Rx; and 15.6% and 98.92% with Path. This suggests that Rx and Path data were not as complete or correct as the RFV for this general practice, which kept its RFV information complete and current for diabetes. However, the completeness is good enough for this purpose as confirmed by the very small relative deterioration of the accuracy (Sensitivity and Specificity of 97.67% and 99.18%) when calculated for the combination of RFV, Rx and Path. The manual EHR audit suggested that the accuracy of the algorithm was influenced by data quality such as incorrect data due to mistaken units of measurement and unavailable data due to non-documentation or documented in the wrong place or progress notes, problems with data extraction, encryption and data management errors.

Conclusion: This DMO-based algorithm is sufficiently accurate to support a semantic approach, using the RFV, Rx and Path to define patients with T2DM from EHR data. However, the accuracy can be compromised by incomplete or incorrect data. The extent of compromise requires further study, using ontology-based and other approaches.

Keywords: Diabetes Mellitus; Electronic Health Records; Ontology; SPARQL; Type 2; Validation studies.

Publication types

  • Validation Study

MeSH terms

  • Algorithms*
  • Diabetes Mellitus, Type 2 / diagnosis*
  • Electronic Health Records*
  • Humans