In search of genome annotation consistency: solid gene clusters and how to use them

James J Davis; Gary J Olsen; Ross Overbeek; Veronika Vonstein; Fangfang Xia

doi:10.1007/s13205-013-0152-2

In search of genome annotation consistency: solid gene clusters and how to use them

3 Biotech. 2014 Jun;4(3):331-335. doi: 10.1007/s13205-013-0152-2. Epub 2013 Jul 6.

Authors

James J Davis¹, Gary J Olsen^{2

3}, Ross Overbeek^{4

5}, Veronika Vonstein⁴, Fangfang Xia⁵

Affiliations

¹ Institute for Genomic Biology, MC-195, University of Illinois at Urbana-Champaign, 1206 W. Gregory Dr., Urbana, IL, 61801, USA. james2@illinois.edu.
² Institute for Genomic Biology, MC-195, University of Illinois at Urbana-Champaign, 1206 W. Gregory Dr., Urbana, IL, 61801, USA.
³ Department of Microbiology, University of Illinois at Urbana-Champaign, 601 S. Goodwin Ave., Urbana, IL, 61801, USA.
⁴ Fellowship for Interpretation of Genomes, 15W155 81st St., Burr Ridge, IL, 60527, USA.
⁵ Mathematics and Computer Science, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL, 60439, USA.

Abstract

Maintaining consistency in genome annotations is important for supporting many computational tasks, particularly metabolic modeling. The SEED project has implemented a process that improves annotation consistencies across microbial genomes for proteins with conserved sequences and genomic context. In this research report, we describe this process and show how this effort has resulted in improvements to microbial genome annotations in the SEED. We also compare SEED annotation consistencies with other commonly used resources such as IMG (the Joint Genome Institute's Integrated Microbial Genomes system), RefSeq (the National Center for Biotechnology Information's Reference Sequence Database), Swiss-Prot (the annotated protein sequence database of the Swiss Institute of Bioinformatics, European Molecular Biology Laboratory and the European Bioinformatics Institute) and TrEMBL (Translated European Molecular Biology Laboratory nucleotide sequence data Library). Our analysis indicates that manual and computational efforts are paying off for the databases where consistency is a major goal.

Keywords: Automatic annotation; Protein clusters.

Publication types

Case Reports

Grants and funding

HHSN272200900040C/AI/NIAID NIH HHS/United States