Biological profiling of gene groups utilizing Gene Ontology

Genome Inform. 2005;16(1):106-15.


Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. differentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical significance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are significantly enriched within a group of interesting genes when compared to a reference group. First we define an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites / genetics
  • Cell Cycle / genetics*
  • Cell Cycle / physiology
  • Data Interpretation, Statistical
  • False Positive Reactions
  • G1 Phase
  • G2 Phase
  • Gene Expression Profiling*
  • Gene Frequency
  • HeLa Cells
  • Humans
  • Mitosis
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Reference Standards
  • Reproducibility of Results
  • S Phase
  • Software
  • Transcription Factors / metabolism
  • Up-Regulation


  • Transcription Factors