Accessing bioscience images from abstract sentences

Bioinformatics. 2006 Jul 15;22(14):e547-56. doi: 10.1093/bioinformatics/btl261.


Images (e.g., figures) are important experimental results that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. On the other hand, biologists live in an age of information explosion. As thousands of biomedical articles are published every day, systems that help biologists efficiently access images in literature would greatly facilitate biomedical research. We hypothesize that much of image content reported in a full-text article can be summarized by the sentences in the abstract of the article. In our study, more than one hundred biologists had tested this hypothesis and more than 40 biologists had evaluated a novel user-interface BioEx that allows biologists to access images directly from abstract sentences. Our results show that 87.8% biologists were in favor of BioEx over two other baseline user-interfaces. We further developed systems that explored hierarchical clustering algorithms to automatically identify abstract sentences that summarize the images. One of the systems achieves a precision of 100% that corresponds to a recall of 4.6%.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Abstracting and Indexing / methods*
  • Artificial Intelligence
  • Biological Science Disciplines / methods*
  • Computer Graphics*
  • Image Interpretation, Computer-Assisted / methods*
  • Information Storage and Retrieval / methods
  • Natural Language Processing*
  • Periodicals as Topic
  • PubMed*
  • User-Computer Interface*
  • Vocabulary, Controlled