Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 5;11:538.
doi: 10.1186/1471-2164-11-538.

AnnoTrack--a Tracking System for Genome Annotation

Affiliations
Free PMC article

AnnoTrack--a Tracking System for Genome Annotation

Felix Kokocinski et al. BMC Genomics. .
Free PMC article

Abstract

Background: As genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.

Results: As part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.

Conclusions: AnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at http://annotrack.sanger.ac.uk.

Figures

Figure 1
Figure 1
Layout of the AnnoTrack system. Input: Heterogeneous sources are accessed by source adaptors and the data is integrated or analysed directly. Output: All data can be retrieved using the Perl API or the web interface; selected data is exported using DAS.
Figure 2
Figure 2
Flow of data within AnnoTrack. Annotation comparisons are based on the genomic coordinates. The users are directed to most interesting or most urgent issues to work on. Legend: Rectangles = data sources, diamonds = tests, rhombs = actions, light grey = steps accomplished by the AnnoTrack system.
Figure 3
Figure 3
Example of a transcript under review. Complete workflow showing the tracking of annotation updates based on an external analysis as described in the text. 3.a: Transcript list page showing filter options, predefined filters and a filtered list of transcripts with open problems in the selected region sorted by genomic start. 3.b: Detail page of a selected transcript from a. with basic annotation data, links to the id or genomic region in public genome browsers, list of flags for this transcripts and links to resolve them individually or combined, coordinates of exons of this transcripts, gene model representation, and links to other transcripts in the region to allow region-wise problem resolution. 3.c: View of page to resolve selected flags from b. with controlled terms. 3.d: History showing all changes. 3.e: Statistics page for monitoring problem solutions. More screenshot are available at [15].

Similar articles

  • Community gene annotation in practice.
    Loveland JE, Gilbert JG, Griffiths E, Harrow JL. Loveland JE, et al. Database (Oxford). 2012 Mar 20;2012:bas009. doi: 10.1093/database/bas009. Print 2012. Database (Oxford). 2012. PMID: 22434843 Free PMC article.
  • GENCODE reference annotation for the human and mouse genomes.
    Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, Mudge JM, Sisu C, Wright J, Armstrong J, Barnes I, Berry A, Bignell A, Carbonell Sala S, Chrast J, Cunningham F, Di Domenico T, Donaldson S, Fiddes IT, García Girón C, Gonzalez JM, Grego T, Hardy M, Hourlier T, Hunt T, Izuogu OG, Lagarde J, Martin FJ, Martínez L, Mohanan S, Muir P, Navarro FCP, Parker A, Pei B, Pozo F, Ruffier M, Schmitt BM, Stapleton E, Suner MM, Sycheva I, Uszczynska-Ratajczak B, Xu J, Yates A, Zerbino D, Zhang Y, Aken B, Choudhary JS, Gerstein M, Guigó R, Hubbard TJP, Kellis M, Paten B, Reymond A, Tress ML, Flicek P. Frankish A, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773. doi: 10.1093/nar/gky955. Nucleic Acids Res. 2019. PMID: 30357393 Free PMC article.
  • Genome annotation techniques: new approaches and challenges.
    Rust AG, Mongin E, Birney E. Rust AG, et al. Drug Discov Today. 2002 Jun 1;7(11):S70-6. doi: 10.1016/s1359-6446(02)02289-4. Drug Discov Today. 2002. PMID: 12047883 Review.
  • Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.
    Frankish A, Uszczynska B, Ritchie GR, Gonzalez JM, Pervouchine D, Petryszak R, Mudge JM, Fonseca N, Brazma A, Guigo R, Harrow J. Frankish A, et al. BMC Genomics. 2015;16 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2164-16-S8-S2. Epub 2015 Jun 18. BMC Genomics. 2015. PMID: 26110515 Free PMC article.
  • Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.
    Falk MJ, Shen L, Gonzalez M, Leipzig J, Lott MT, Stassen AP, Diroma MA, Navarro-Gomez D, Yeske P, Bai R, Boles RG, Brilhante V, Ralph D, DaRe JT, Shelton R, Terry SF, Zhang Z, Copeland WC, van Oven M, Prokisch H, Wallace DC, Attimonelli M, Krotoski D, Zuchner S, Gai X; MSeqDR Consortium Participants; MSeqDR Consortium participants: Sherri Bale, Jirair Bedoyan, Doron Behar, Penelope Bonnen, Lisa Brooks, Claudia Calabrese, Sarah Calvo, Patrick Chinnery, John Christodoulou, Deanna Church,; Rosanna Clima, Bruce H. Cohen, Richard G. Cotton, IFM de Coo, Olga Derbenevoa, Johan T. den Dunnen, David Dimmock, Gregory Enns, Giuseppe Gasparre,; Amy Goldstein, Iris Gonzalez, Katrina Gwinn, Sihoun Hahn, Richard H. Haas, Hakon Hakonarson, Michio Hirano, Douglas Kerr, Dong Li, Maria Lvova, Finley Macrae, Donna Maglott, Elizabeth McCormick, Grant Mitchell, Vamsi K. Mootha, Yasushi Okazaki,; Aurora Pujol, Melissa Parisi, Juan Carlos Perin, Eric A. Pierce, Vincent Procaccio, Shamima Rahman, Honey Reddi, Heidi Rehm, Erin Riggs, Richard Rodenburg, Yaffa Rubinstein, Russell Saneto, Mariangela Santorsola, Curt Scharfe,; Claire Sheldon, Eric A. Shoubridge, Domenico Simone, Bert Smeets, Jan A. Smeitink, Christine Stanley, Anu Suomalainen, Mark Tarnopolsky, Isabelle Thiffault, David R. Thorburn, Johan Van Hove, Lynne Wolfe, and Lee-Jun Wong. Falk MJ, et al. Mol Genet Metab. 2015 Mar;114(3):388-96. doi: 10.1016/j.ymgme.2014.11.016. Epub 2014 Dec 4. Mol Genet Metab. 2015. PMID: 25542617 Free PMC article. Review.
See all similar articles

Cited by 3 articles

References

    1. Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE, Guigo R. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7(Suppl 1):S4. doi: 10.1186/gb-2006-7-s1-s4. 1-9. - DOI - PMC - PubMed
    1. GENCODE project pages. http://www.gencodegenes.org
    1. Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. doi: 10.1101/gr.080531.108. - DOI - PMC - PubMed
    1. Flicek P, Aken BL, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Koscielny G, Kulesha E, Lawson D, Longden I, Massingham T, McLaren W, Megy K, Overduin B, Pritchard B, Rios D, Ruffier M, Schuster M, Slater G, Smedley D, Spudich G, Tang YA, Trevanion S, Vilella A, Vogel J, White S, Wilder SP, Zadissa A, Birney E, Cunningham F, Dunham I, Durbin R, Fernández-Suarez XM, Herrero J, Hubbard TJ, Parker A, Proctor G, Smith J, Searle SM. Ensembl's 10th year. Nucleic Acids Res. 2010;28:D557–562. doi: 10.1093/nar/gkp972. - DOI - PMC - PubMed
    1. Genome Reference Consortium report system. http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/ReportAnIssue.shtml

Publication types

LinkOut - more resources

Feedback