Background: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall.
Results: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats.
Conclusions: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.