Clinical Note Section Detection Using a Hidden Markov Model of Unified Medical Language System Semantic Types

AMIA Annu Symp Proc. 2022 Feb 21;2021:418-427. eCollection 2021.


Clinical notes are a rich source of biomedical data for natural language processing (NLP). The identification of note sections represents a first step in creating portable NLP tools. Here, a system that used a heterogeneous hidden Markov model (HMM) was designed to identify seven note sections: (1) Medical History, (2) Medications, (3) Family and Social History, (4) Physical Exam, (5) Labs and Imaging, (6) Assessment and Plan, and (7) Review of Systems. Unified Medical Language System (UMLS) concepts were identified using MetaMap, and UMLS semantic type distributions for each section type were empirically determined. The UMLS semantic type distributions were used to train the HMM for identifying clinical note sections. The system was evaluated relative to a template boundary model using manually annotated notes from the Medical Information Mart for Intensive Care III. The results show promise for an approach to segment clinical notes into sections for subsequent NLP tasks.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Humans
  • Natural Language Processing
  • Semantics*
  • Unified Medical Language System*