Clinical notes are a rich source of biomedical data for natural language processing (NLP). The identification of note sections represents a first step in creating portable NLP tools. Here, a system that used a heterogeneous hidden Markov model (HMM) was designed to identify seven note sections: (1) Medical History, (2) Medications, (3) Family and Social History, (4) Physical Exam, (5) Labs and Imaging, (6) Assessment and Plan, and (7) Review of Systems. Unified Medical Language System (UMLS) concepts were identified using MetaMap, and UMLS semantic type distributions for each section type were empirically determined. The UMLS semantic type distributions were used to train the HMM for identifying clinical note sections. The system was evaluated relative to a template boundary model using manually annotated notes from the Medical Information Mart for Intensive Care III. The results show promise for an approach to segment clinical notes into sections for subsequent NLP tasks.
©2021 AMIA - All rights reserved.