Natural Language Processing Improves Detection of Nonsevere Hypoglycemia in Medical Records Versus Coding Alone in Patients With Type 2 Diabetes but Does Not Improve Prediction of Severe Hypoglycemia Events: An Analysis Using the Electronic Medical Record in a Large Health System

Diabetes Care. 2020 Aug;43(8):1937-1940. doi: 10.2337/dc19-1791. Epub 2020 May 15.


Objective: To determine if natural language processing (NLP) improves detection of nonsevere hypoglycemia (NSH) in patients with type 2 diabetes and no NSH documentation by diagnosis codes and to measure if NLP detection improves the prediction of future severe hypoglycemia (SH).

Research design and methods: From 2005 to 2017, we identified NSH events by diagnosis codes and NLP. We then built an SH prediction model.

Results: There were 204,517 patients with type 2 diabetes and no diagnosis codes for NSH. Evidence of NSH was found in 7,035 (3.4%) of patients using NLP. We reviewed 1,200 of the NLP-detected NSH notes and confirmed 93% to have NSH. The SH prediction model (C-statistic 0.806) showed increased risk with NSH (hazard ratio 4.44; P < 0.001). However, the model with NLP did not improve SH prediction compared with diagnosis code-only NSH.

Conclusions: Detection of NSH improved with NLP in patients with type 2 diabetes without improving SH prediction.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Clinical Decision Rules
  • Community Health Planning / methods
  • Community Health Planning / organization & administration
  • Diabetes Mellitus, Type 2 / blood
  • Diabetes Mellitus, Type 2 / complications
  • Diabetes Mellitus, Type 2 / epidemiology*
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Humans
  • Hypoglycemia / diagnosis*
  • Hypoglycemia / epidemiology
  • Hypoglycemia / pathology
  • Information Storage and Retrieval / methods
  • Information Storage and Retrieval / standards
  • International Classification of Diseases* / standards
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Predictive Value of Tests
  • Severity of Illness Index
  • United States / epidemiology
  • Young Adult

Associated data

  • figshare/10.2337/figshare.12116709