Quality of computationally inferred gene ontology annotations

Nives Skunca; Adrian Altenhoff; Christophe Dessimoz

doi:10.1371/journal.pcbi.1002533

Quality of computationally inferred gene ontology annotations

PLoS Comput Biol. 2012 May;8(5):e1002533. doi: 10.1371/journal.pcbi.1002533. Epub 2012 May 31.

Authors

Nives Skunca¹, Adrian Altenhoff, Christophe Dessimoz

Affiliation

¹ Ruđer Bošković Institute, Division of Electronics, Zagreb, Croatia.

Abstract

Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon-an important outcome given that >98% of all annotations are inferred without direct curation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods*
Database Management Systems
Databases, Genetic*
Molecular Sequence Annotation / methods*
Reproducibility of Results
Vocabulary, Controlled*