Content-based image retrieval for scientific literature access

Methods Inf Med. 2009;48(4):371-80. doi: 10.3414/ME0561. Epub 2009 Jul 20.


Objectives: An increasing number of articles are published electronically in the scientific literature, but access is limited to alphanumerical search on title, author, or abstract, and may disregard numerous figures. In this paper, we estimate the benefits of using content-based image retrieval (CBIR) on article figures to augment traditional access to articles.

Methods: We selected four high-impact journals from the Journal Citations Report (JCR) 2005. Figures were automatically extracted from the PDF article files, and manually classified on their content and number of sub-figure panels. We make a quantitative estimate by projecting from data from the Cross-Language Evaluation Forum (ImageCLEF) campaigns, and qualitatively validate it through experiments using the Image Retrieval in Medical Applications (IRMA) project.

Results: Based on 2077 articles with 11,753 pages, 4493 figures, and 11,238 individual images, the predicted accuracy for article retrieval may reach 97.08%.

Conclusions: Therefore, CBIR potentially has a high impact in medical literature search and retrieval.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural

MeSH terms

  • Databases, Bibliographic*
  • Diagnostic Imaging*
  • Humans
  • Information Storage and Retrieval*
  • Internet*