Development of a Natural Language Processing Algorithm to Identify and Evaluate Transgender Patients in Electronic Health Record Systems

Ethn Dis. 2019 Jun 13;29(Suppl 2):441-450. doi: 10.18865/ed.29.S2.441. eCollection 2019.


Objective: To create a natural language processing (NLP) algorithm to identify transgender patients in electronic health records.

Design: We developed an NLP algorithm to identify patients (keyword + billing codes). Patients were manually reviewed, and their health care services categorized by billing code.

Setting: Vanderbilt University Medical Center.

Participants: 234 adult and pediatric transgender patients.

Main outcome measures: Number of transgender patients correctly identified and categorization of health services utilized.

Results: We identified 234 transgender patients of whom 50% had a diagnosed mental health condition, 14% were living with HIV, and 7% had diabetes. Largely driven by hormone use, nearly half of patients attended the Endocrinology/Diabetes/Metabolism clinic. Many patients also attended the Psychiatry, HIV, and/or Obstetrics/Gynecology clinics. The false positive rate of our algorithm was 3%.

Conclusions: Our novel algorithm correctly identified transgender patients and provided important insights into health care utilization among this marginalized population.

Keywords: Electronic Health Records; Natural Language Processing; Transgender; Utilization.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Child
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Transgender Persons / statistics & numerical data*
  • Young Adult