Develop and validate a computable phenotype for identifying alcohol-use disorder patients using structure and unstructured EHR data

Alcohol Alcohol. 2025 Nov 16;61(1):agaf086. doi: 10.1093/alcalc/agaf086.

Abstract

Background: Alcohol Use Disorder (AUD) drives significant morbidity through alcohol-related liver disease. Accurate AUD identification in electronic health records is critical for research and care delivery, yet International Classification of Diseases (ICD) code-based algorithms miss many cases while manual review is impractical at scale. Computable phenotypes (CPs) integrating structured and unstructured EHR data offer a scalable solution.

Methods: Using University of Florida Health's Integrated Data Repository covering two million patients, we developed AUD CPs through a two-step process. First, candidate cohorts were identified using AUD-related ICD codes, medications, and keyword searches across structured and unstructured data. Second, rule-based combinations were iteratively refined through manual chart review. Final algorithms were evaluated against gold-standard chart review, measuring sensitivity, positive predictive value (PPV), and F1-score, then validated in an independent testing set and an external dataset.

Results: The F1-optimized CP achieved an F1-score of .87 (sensitivity: .98, PPV: .78) in the testing set, while the precision-optimized CP achieved PPV of .9 (sensitivity: .68, F1-score: .77). Minimal performance attenuation between training and testing sets demonstrated robustness and generalizability. Both CPs substantially outperformed restricted AUD-specific ICD code-based approaches.

Conclusions: CPs integrating structured and unstructured EHR data enable accurate, reproducible AUD identification, surpassing traditional AUD-specific ICD-based methods. This approach facilitates efficient cohort construction for clinical research, public health surveillance, and quality improvement initiatives targeting AUD and its consequences, addressing a critical gap in identifying patients who may benefit from screening and intervention.

Keywords: alcohol use disorder; computable phenotype; natural language processing; real-world evidence.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Alcoholism* / diagnosis
  • Alcoholism* / epidemiology
  • Algorithms
  • Electronic Health Records*
  • Female
  • Humans
  • International Classification of Diseases
  • Male
  • Middle Aged
  • Phenotype*