Biological profiling of gene groups utilizing Gene Ontology

Nils Blüthgen; Karsten Brand; Branka Cajavec; Maciej Swat; Hanspeter Herzel; Dieter Beule

Biological profiling of gene groups utilizing Gene Ontology

Genome Inform. 2005;16(1):106-15.

Authors

Nils Blüthgen¹, Karsten Brand, Branka Cajavec, Maciej Swat, Hanspeter Herzel, Dieter Beule

Affiliation

¹ Institute for Theoretical Biology, Humboldt University Berlin, Germany. n.bluethgen@biologie.hu-berlin.de

PMID: 16362912

Abstract

Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. differentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical significance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are significantly enriched within a group of interesting genes when compared to a reference group. First we define an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at http://gossip.gene-groups.net/.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Binding Sites / genetics
Cell Cycle / genetics*
Cell Cycle / physiology
Data Interpretation, Statistical
False Positive Reactions
G1 Phase
G2 Phase
Gene Expression Profiling*
Gene Frequency
HeLa Cells
Humans
Mitosis
Models, Statistical
Oligonucleotide Array Sequence Analysis
Reference Standards
Reproducibility of Results
S Phase
Software
Transcription Factors / metabolism
Up-Regulation

Substances

Transcription Factors