Increasingly used high throughput experimental techniques, like DNA or protein microarrays give as a result groups of interesting, e.g. differentially regulated genes which require further biological interpretation. With the systematic functional annotation provided by the Gene Ontology the information required to automate the interpretation task is now accessible. However, the determination of statistical significance of a biological process within these groups is still an open question. In answering this question, multiple testing issues must be taken into account to avoid misleading results. Here we present a statistical framework that tests whether functions, processes or locations described in the Gene Ontology are significantly enriched within a group of interesting genes when compared to a reference group. First we define an exact analytical expression for the expected number of false positives that allows us to calculate adjusted p-values to control the false discovery rate. Next, we demonstrate and discuss the capabilities of our approach using publicly available microarray data on cell-cycle regulated genes. Further, we analyze the robustness of our framework with respect to the exact gene group composition and compare the performance with earlier approaches. The software package GOSSIP implements our method and is made freely available at http://gossip.gene-groups.net/.