UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text

J Biomed Inform. 2010 Aug;43(4):587-94. doi: 10.1016/j.jbi.2010.02.005. Epub 2010 Feb 10.


Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text. We found that suppression of highly ambiguous terms in the conservative AutoFilter content view can partially replace manual filtering for literature applications, and suppression of two character mappings in the same content view achieves 89.5% precision at 78.6% recall for clinical applications.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Information Storage and Retrieval / methods
  • Natural Language Processing*
  • Publications
  • Unified Medical Language System / standards*