Several institutions provide genomic annotation data, and therefore these data show a significant segmentation and redundancy. Public databases allow access, through their own methods, to genomic and proteomic sequences and related annotation. Although some cross-reference tables are available, they don't cover the complete datasets provided by these databases. The Genomic Annotation Gathering project intends to unify annotation data provided by GenBank and Ensembl. We introduce an intra-species, cross-bank method. Generated results provide an enriched set of cross- references. This method allows for identifying an average of 30% of new cross-references that can be integrated to other utilities dedicated to analyzing related annotation data. By using only sequence comparison, we are able to unify two datasets that previously didn't share any stable cross-bank accession method. The whole process is hosted by the GenOuest platform to provide public access to newly generated cross-references and to allow for regular updates (http://gag.genouest.org).
Keywords: Annotation; CCDS; Consensus Coding Sequence; Cross-references; Database; GAG; GUI; Genomic Annotation Gathering; Graphical User Interface; HSP; High-scoring Segment Pair; Prediction; Sequence comparison.
© 2013 Elsevier B.V. All rights reserved.