Expanding the extent of a UMLS semantic type via group neighborhood auditing

J Am Med Inform Assoc. Sep-Oct 2009;16(5):746-57. doi: 10.1197/jamia.M2951. Epub 2009 Jun 30.


Objective: Each Unified Medical Language System (UMLS) concept is assigned one or more semantic types (ST). A dynamic methodology for aiding an auditor in finding concepts that are missing the assignment of a given ST, S is presented.

Design: The first part of the methodology exploits the previously introduced Refined Semantic Network and accompanying refined semantic types (RST) to help narrow the search space for offending concepts. The auditing is focused in a neighborhood surrounding the extent of an RST, T (of S) called an envelope, consisting of parents and children of concepts in the extent. The audit moves outward as long as missing assignments are discovered. In the second part, concepts not reached previously are processed and reassigned T as needed during the processing of S's other RSTs. The set of such concepts is expanded in a similar way to that in the first part.

Measurements: The number of errors discovered is reported. To measure the methodology's efficiency, "error hit rates" (i.e., errors found in concepts examined) are computed.

Results: The methodology was applied to three STs: Experimental Model of Disease (EMD), Environmental Effect of Humans, and Governmental or Regulatory Activity. The EMD experienced the most drastic change. For its RST "EMD intersection Neoplastic Process" (RST "EMD") with only 33 (31) original concepts, 915 (134) concepts were found by the first (second) part to be missing the EMD assignment. Changes to the other two STs were smaller.

Conclusion: The results show that the proposed auditing methodology can help to effectively and efficiently identify concepts lacking the assignment of a particular semantic type.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Humans
  • Information Storage and Retrieval*
  • Quality Control
  • Semantics
  • Unified Medical Language System*