The Value of Unstructured Electronic Health Record Data in Geriatric Syndrome Case Identification

J Am Geriatr Soc. 2018 Aug;66(8):1499-1507. doi: 10.1111/jgs.15411. Epub 2018 Jul 4.


Objectives: To examine the value of unstructured electronic health record (EHR) data (free-text notes) in identifying a set of geriatric syndromes.

Design: Retrospective analysis of unstructured EHR notes using a natural language processing (NLP) algorithm.

Setting: Large multispecialty group.

Participants: Older adults (N=18,341; average age 75.9, 58.9% female).

Measurements: We compared the number of geriatric syndrome cases identified using structured claims and structured and unstructured EHR data. We also calculated these rates using a population-level claims database as a reference and identified comparable epidemiological rates in peer-reviewed literature as a benchmark.

Results: Using insurance claims data resulted in a geriatric syndrome prevalence ranging from 0.03% for lack of social support to 8.3% for walking difficulty. Using structured EHR data resulted in similar prevalence rates, ranging from 0.03% for malnutrition to 7.85% for walking difficulty. Incorporating unstructured EHR notes, enabled by applying the NLP algorithm, identified considerably higher rates of geriatric syndromes: absence of fecal control (2.1%, 2.3 times as much as structured claims and EHR data combined), decubitus ulcer (1.4%, 1.7 times as much), dementia (6.7%, 1.5 times as much), falls (23.6%, 3.2 times as much), malnutrition (2.5%, 18.0 times as much), lack of social support (29.8%, 455.9 times as much), urinary retention (4.2%, 3.9 times as much), vision impairment (6.2%, 7.4 times as much), weight loss (19.2%, 2.9 as much), and walking difficulty (36.34%, 3.4 as much). The geriatric syndrome rates extracted from structured data were substantially lower than published epidemiological rates, although adding the NLP results considerably closed this gap.

Conclusion: Claims and structured EHR data give an incomplete picture of burden related to geriatric syndromes. Geriatric syndromes are likely to be missed if unstructured data are not analyzed. Pragmatic NLP algorithms can assist with identifying individuals at high risk of experiencing geriatric syndromes and improving coordination of care for older adults.

Keywords: case identification; electronic health records; geriatric syndromes; natural language processing and text-mining; unstructured free-text data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms
  • Databases, Factual
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Frail Elderly / statistics & numerical data*
  • Frailty / epidemiology*
  • Humans
  • Male
  • Mobility Limitation
  • Natural Language Processing
  • Prevalence
  • Retrospective Studies
  • Social Support
  • Syndrome