Aligning UniProt and MeSH - a case study on human protein terms

Stud Health Technol Inform. 2010;160(Pt 2):1030-4.


Terminologies which lack semantic connectivity hamper the effective search in biomedical fact databases and document retrieval systems. We here focus on the integration of two such isolated resources, the term lists from the protein fact database UNIPROT and the indexing vocabulary MESH from the bibliographic database MEDLINE. The generated semantic ties result from string matching and term set inclusion. We investigated the implicit terminological overlap between both resources in the domain of human proteins and evaluated our approach on a sample of 550 randomly selected UNIPROT entries that were manually mapped to their corresponding MESH headings. We achieved 90% precision and 79% recall (applying taxonomy-sensitive metrics). Fortunately, those proteins we were able to map to the MESH are ten times as frequently discussed in the literature as those on which we failed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Bibliographic
  • Databases, Protein*
  • Humans
  • Medical Subject Headings*
  • Proteins / classification
  • Terminology as Topic*
  • United States
  • Vocabulary, Controlled


  • Proteins