Bridging the Granularity Gap in Family History Information Extracted from Clinical Narratives

AMIA Annu Symp Proc. 2023 Apr 29:2022:795-804. eCollection 2022.


Family history (FH) is important for disease risk assessment and prevention. However, incorporating FH information derived from electronic health records (EHRs) for downstream analytics is challenging due to the lack of standardization. We aimed to automatically align FH concepts derived from a clinical corpus to disease category resources popularly used, including Clinical Classification System (CCS), Phecode, Comparative Toxicogenomics Database (CTD), Human phenotype ontology, and Human disease ontology (HDO). Leveraging the Unified Medical Language System (UMLS), we achieved high mapping coverages of FH concepts in those resources, using the parent and broader/alike relations available in the UMLS. Among the five resources, CTD has the best coverage (93%) of FH concepts, HDO has the coarsest granularity of FH disease categories, while CCS showed the finest-grained regarding disease categories. The study suggests that we can mitigate the challenge of various degrees of granularity of NLP-derived FH using those ontology or terminological resources.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Databases, Factual
  • Electronic Health Records
  • Humans
  • Narration*
  • Natural Language Processing
  • Risk Assessment
  • Unified Medical Language System*