Discovering missed synonymy in a large concept-oriented Metathesaurus

Proc AMIA Symp. 2000;354-8.


The Unified Medical Language System (UMLS) [1, 2] Metathesuarus is concept-oriented; its goal is to unite all names with identical meaning in a single Concept. The names come from its constituent vocabularies or "sources"--a wide variety of biomedical terminologies including many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic, research, full-text, and expert systems. Many offer little definitional information, and many are not themselves concept-oriented, so identifying synonymy is a challenging semantic task [3]. The rapidly increasing size of the Metathesaurus makes the task daunting, demanding effective computational support; there are more than 1.5 million names for 730,000 concepts in the January 2000 release. Vocabularies are added and updated using sophisticated lexical matching, selective algorithms, and expert review [4, 5, 6]. Yet the result is imperfect; we have discovered and corrected missed synonymy in approximately 1% of previously released concepts each year. This paper reviews general methods for finding missed synonymy and describes several specific novel approaches which we have found effective.

MeSH terms

  • Algorithms
  • Subject Headings*
  • Unified Medical Language System*
  • Vocabulary, Controlled