Differentiating Sense through Semantic Interaction Data

AMIA Annu Symp Proc. 2017 Feb 10;2016:1238-1247. eCollection 2016.


Words which have different representations but are semantically related, such as dementia and delirium, can pose difficult issues in understanding text. We explore the use of interaction frequency data between semantic elements as a means to differentiate concept pairs, using semantic predications extracted from the biomedical literature. We applied datasets of features drawn from semantic predications for semantically related pairs to two Expectation Maximization clustering processes (without, and with concept labels), then used all data to train and evaluate several concept classifying algorithms. For the unlabeled datasets, 80% displayed expected cluster count and similar or matching proportions; all labeled data exhibited similar or matching proportions when restricting cluster count to unique labels. The highest performing classifier achieved 89% accuracy, with F1 scores for individual concept classification ranging from 0.69 to 1. We conclude with a discussion on how these findings may be applied to natural language processing of clinical text.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Humans
  • Natural Language Processing*
  • Semantics*
  • Software
  • Terminology as Topic*