Evaluating the Impact of Dictionary Updates on Automatic Annotations Based on Clinical NLP Systems

Yadan Fan; Andrew Wen; Feichen Shen; Sunghwan Sohn; Hongfang Liu; Liwei Wang

Evaluating the Impact of Dictionary Updates on Automatic Annotations Based on Clinical NLP Systems

AMIA Jt Summits Transl Sci Proc. 2019 May 6:2019:714-721. eCollection 2019.

Authors

Yadan Fan^{1

2}, Andrew Wen¹, Feichen Shen¹, Sunghwan Sohn¹, Hongfang Liu¹, Liwei Wang¹

Affiliations

¹ Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, United States.
² Institute for Health Informatics, University of Minnesota, Minneapolis, MN, United States.

PMID: 31259028
PMCID: PMC6568114

Abstract

Concept encoding, which maps text spans to concepts in standard terminologies, is a critical component in clinical natural language processing (NLP) systems to allow semantic interoperability with other clinical applications. A majority of clinical NLP systems adopt dictionary or lexicon based approaches and the performance of concept encoding is often evaluated using a human created gold standard generated with reference to the most up-to-date standard terminologies available at the time of gold standard creation. With the advance of medical science, standard terminologies or dictionaries can evolve. However, it remains unknown whether the dictionary updates will impact the performance of concept encoding. In this study, we evaluated the annotation performance of two clinical NLP systems, cTAKES and MedXN based on updated dictionaries to gain further insights. Specifically, we compared the automatic annotation results with previously manually generated gold standards. The results of our study demonstrate the annotation changes based on dictionary updates in clinical NLP systems and that it is necessary to do temporal management for gold standards, which raises the need for appropriate terminology management tools for back version compatibility to update gold standards.

Keywords: concept encoding,; dictionary update,; gold standards; natural language processing,.