Analysis of MeSH Indexing Patterns and Frequency of Predicates

Stud Health Technol Inform. 2018:247:666-670.

Abstract

Organised repositories of published scientific literature represent a rich source for research in knowledge representation. MEDLINE, one of the largest and most popular biomedical literature databases, provides metadata for over 24 million articles each of which is indexed using the MeSH controlled vocabulary. In order to reuse MeSH annotations for knowledge construction, we processed this data and extracted the most relevant patterns of assigned descriptors over time. The patterns consist of UMLS semantic groups related to the MeSH headings together with their associated MeSH subheadings. Then, we connected the patterns with the most frequent predicates in their corresponding MEDLINE abstracts. Thereafter, we conducted a time series analysis of the extracted patterns from MEDLINE records and their associated predicates in order to study the evolution of manual MeSH indexing. The results show an increasing diversity of the assigned MESH terms over time, along with the increase of scientific publication per year. We obtained evidence of consistency of the relevant predicates associated with the extracted patterns. Moreover, for the most frequent patterns some predicates predominate over others such as Treats between substances and disorders, Causes between pairs of disorders, or Interacts between pairs of substances.

Keywords: MeSH; Pattern extraction; Predicate induction; Time series analysis.

MeSH terms

  • Data Mining*
  • Databases, Factual
  • Humans
  • MEDLINE*
  • Medical Subject Headings*
  • Semantics