A literature-based similarity metric for biological processes

BMC Bioinformatics. 2006 Jul 26;7:363. doi: 10.1186/1471-2105-7-363.

Abstract

Background: Recent analyses in systems biology pursue the discovery of functional modules within the cell. Recognition of such modules requires the integrative analysis of genome-wide experimental data together with available functional schemes. In this line, methods to bridge the gap between the abstract definitions of cellular processes in current schemes and the interlinked nature of biological networks are required.

Results: This work explores the use of the scientific literature to establish potential relationships among cellular processes. To this end we have used a document based similarity method to compute pair-wise similarities of the biological processes described in the Gene Ontology (GO). The method has been applied to the biological processes annotated for the Saccharomyces cerevisiae genome. We compared our results with similarities obtained with two ontology-based metrics, as well as with gene product annotation relationships. We show that the literature-based metric conserves most direct ontological relationships, while reveals biologically sounded similarities that are not obtained using ontology-based metrics and/or genome annotation.

Conclusion: The scientific literature is a valuable source of information from which to compute similarities among biological processes. The associations discovered by literature analysis are a valuable complement to those encoded in existing functional schemes, and those that arise by genome annotation. These similarities can be used to conveniently map the interlinked structure of cellular processes in a particular organism.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Bibliographic*
  • Databases, Genetic
  • Genes, Fungal
  • Ion Transport / genetics
  • Natural Language Processing
  • Pattern Recognition, Automated
  • Physiological Phenomena*
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae / physiology
  • Saccharomyces cerevisiae Proteins / classification
  • Saccharomyces cerevisiae Proteins / genetics
  • Saccharomyces cerevisiae Proteins / physiology
  • Signal Transduction / genetics

Substances

  • Saccharomyces cerevisiae Proteins