BayGO: Bayesian analysis of ontology term enrichment in microarray data

BMC Bioinformatics. 2006 Feb 23:7:86. doi: 10.1186/1471-2105-7-86.

Abstract

Background: The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem.

Results: BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses.

Conclusion: The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Bacteria / chemistry
  • Bacteria / genetics
  • Bayes Theorem*
  • Computational Biology* / methods
  • Computational Biology* / statistics & numerical data
  • Gene Expression Regulation, Bacterial
  • Heat-Shock Proteins / biosynthesis
  • Heat-Shock Proteins / chemistry
  • Heat-Shock Proteins / genetics
  • Oligonucleotide Array Sequence Analysis / instrumentation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Signal Transduction / genetics
  • Software
  • Terminology as Topic*

Substances

  • Heat-Shock Proteins