Semi-automatic indexing of full text biomedical articles

AMIA Annu Symp Proc. 2005;2005:271-5.


The main application of U.S. National Library of Medicine's Medical Text Indexer (MTI) is to provide indexing recommendations to the Library's indexing staff. The current input to MTI consists of the titles and abstracts of articles to be indexed. This study reports on an extension of MTI to the full text of articles appearing in online medical journals that are indexed for Medline. Using a collection of 17 journal issues containing 500 articles, we report on the effectiveness of the contribution of terms by the whole article and also by each section. We obtain the best results using a model consisting of the sections Results, Results and Discussion, and Conclusions together with the article's title and abstract, the captions of tables and figures, and sections that have no titles. The resulting model provides indexing significantly better (7.4%) than what is currently achieved using only titles and abstracts.

MeSH terms

  • Abstracting and Indexing / methods*
  • Algorithms
  • Libraries, Digital
  • Medical Subject Headings*
  • National Library of Medicine (U.S.)
  • Natural Language Processing*
  • Periodicals as Topic
  • United States