The structure and semantics of clinical notes vary considerably across different Electronic Health Record (EHR) systems, sites, and institutions. Such heterogeneity hampers the portability of natural language processing (NLP) models in extracting information from the text for clinical research or practice. In this study, we evaluate the contextual variation of clinical notes by measuring the semantic and syntactic similarity of the notes of two sets of physicians comprising four medical specialties across EHR migrations at two Mayo Clinic sites. We find significant semantic and syntactic variation imposed by the context of the EHR system and between medical specialties whereas only minor variation is caused by variation of spatial context across sites. Our findings suggest that clinical language models need to account for process differences at the specialty sublanguage level to be generalizable.
©2023 AMIA - All rights reserved.