Multiple approaches to fine-grained indexing of the biomedical literature

Pac Symp Biocomput. 2007;292-303.

Abstract

The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. The methods, (dictionary-based, post- processing rules and Natural Language Processing rules) are described and evaluated on a genetics-related corpus. The best overall performance is obtained for the subheading genetics (70% precision and 17% recall with post-processing rules, 48% precision and 37% recall with the dictionary-based method). Future work will address extending this work to all MeSH subheadings and a more thorough study of method combination.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural

MeSH terms

  • Abstracting and Indexing / methods*
  • Abstracting and Indexing / statistics & numerical data
  • Artificial Intelligence
  • Computational Biology
  • Dictionaries, Medical as Topic
  • MEDLINE*
  • Medical Subject Headings
  • Natural Language Processing