Automated non-alphanumeric symbol resolution in clinical texts

SungRim Moon; Serguei Pakhomov; James Ryan; Genevieve B Melton

Automated non-alphanumeric symbol resolution in clinical texts

AMIA Annu Symp Proc. 2011:2011:979-86. Epub 2011 Oct 22.

Authors

SungRim Moon¹, Serguei Pakhomov, James Ryan, Genevieve B Melton

Affiliation

¹ Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.

PMID: 22195157
PMCID: PMC3243158

Abstract

Although clinical texts contain many symbols, relatively little attention has been given to symbol resolution by medical natural language processing (NLP) researchers. Interpreting the meaning of symbols may be viewed as a special case of Word Sense Disambiguation (WSD). One thousand instances of four common non-alphanumeric symbols ('+', '-', '/', and '#') were randomly extracted from a clinical document repository and annotated by experts. The symbols and their surrounding context, in addition to bag-of-Words (BoW), and heuristic rules were evaluated as features for the following classifiers: Naïve Bayes, Support Vector Machine, and Decision Tree, using 10-fold cross-validation. Accuracies for '+', '-', '/', and '#' were 80.11%, 80.22%, 90.44%, and 95.00% respectively, with Naïve Bayes. While symbol context contributed the most, BoW was also helpful for disambiguation of some symbols. Symbol disambiguation with supervised techniques can be implemented with reasonable accuracy as a module for medical NLP systems.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Artificial Intelligence*
Bayes Theorem
Decision Trees*
Electronic Health Records
Language
Natural Language Processing*
Pilot Projects
Support Vector Machine*

Abstract

Publication types

MeSH terms

Grants and funding