Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 20;2012:bas008.
doi: 10.1093/database/bas008. Print 2012.

Tracking and Coordinating an International Curation Effort for the CCDS Project

Free PMC article

Tracking and Coordinating an International Curation Effort for the CCDS Project

Rachel A Harte et al. Database (Oxford). .
Free PMC article


The Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation. The CCDS project consists of analysis of automated whole-genome annotation builds to identify identical CDS annotations, quality assurance testing and manual curation support. Identical CDS annotations are tracked with a CCDS identifier (ID) and any future change to the annotated CDS structure must be agreed upon by the collaborating members. CCDS curation guidelines were developed to address some aspects of curation in order to improve initial annotation consistency and to reduce time spent in discussing proposed annotation updates. Here, we present the current status of the CCDS database and details on our procedures to track and coordinate our efforts. We also present the relevant background and reasoning behind the curation standards that we have developed for CCDS database treatment of transcripts that are nonsense-mediated decay (NMD) candidates, for transcripts containing upstream open reading frames, for identifying the most likely translation start codons and for the annotation of readthrough transcripts. Examples are provided to illustrate the application of these guidelines. DATABASE URL:


Figure 1.
Figure 1.
The flowchart outlines the CCDS review process (light gray boxes). CCDS IDs undergo status changes during and following the review process, as indicated by the colored boxes, where light green indicates ‘Public’ status, red indicates an ongoing review that has not yet reached consensus, orange indicates a pending update or withdrawal that has reached consensus, and purple indicates ‘Withdrawn’ status.
Figure 2.
Figure 2.
UCSC Genome Browser view of the human KLHL35 (kelch-like 35) gene. CCDS8237.1 was based on AK091109.1 (mRNA track, blue). This CCDS ID has now been withdrawn because a retained intron introduces a premature termination codon, rendering the transcript an NMD candidate. CCDS44685.2 representing the completely processed full-length variant remains valid for this gene.
Figure 3.
Figure 3.
UCSC Genome Browser view of CCDS4929.1, which was updated to version 2, representing a variant of the human CRISP3 (cysteine-rich secretory protein 3) gene. The CDS was extended at the 5′-end. (a) Both the longer protein (258 amino acids) encoded by the update and the shorter protein (245 amino acids) have predicted signal peptides (SignalPv4.0) of 32 amino acids and 19 amino acids, respectively. (b and c) Base-level view. The upstream AUG start codon (b) has the weaker Kozak context (blue box) and is only conserved among primates (red box), whereas the downstream AUG (c) is conserved among more mammals (46-way alignment and conservation track).

Similar articles

  • Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.
    Pujar S, O'Leary NA, Farrell CM, Loveland JE, Mudge JM, Wallin C, Girón CG, Diekhans M, Barnes I, Bennett R, Berry AE, Cox E, Davidson C, Goldfarb T, Gonzalez JM, Hunt T, Jackson J, Joardar V, Kay MP, Kodali VK, Martin FJ, McAndrews M, McGarvey KM, Murphy M, Rajput B, Rangwala SH, Riddick LD, Seal RL, Suner MM, Webb D, Zhu S, Aken BL, Bruford EA, Bult CJ, Frankish A, Murphy T, Pruitt KD. Pujar S, et al. Nucleic Acids Res. 2018 Jan 4;46(D1):D221-D228. doi: 10.1093/nar/gkx1031. Nucleic Acids Res. 2018. PMID: 29126148 Free PMC article.
  • Current status and new features of the Consensus Coding Sequence database.
    Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Farrell CM, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11. Nucleic Acids Res. 2014. PMID: 24217909 Free PMC article.
  • The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.
    Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Pruitt KD, et al. Genome Res. 2009 Jul;19(7):1316-23. doi: 10.1101/gr.080531.108. Epub 2009 Jun 4. Genome Res. 2009. PMID: 19498102 Free PMC article.
  • No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.
    Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. Danchin A, et al. Microb Biotechnol. 2018 Jul;11(4):588-605. doi: 10.1111/1751-7915.13284. Epub 2018 May 28. Microb Biotechnol. 2018. PMID: 29806194 Free PMC article. Review.
  • The curation of genetic variants: difficulties and possible solutions.
    Pandey KR, Maden N, Poudel B, Pradhananga S, Sharma AK. Pandey KR, et al. Genomics Proteomics Bioinformatics. 2012 Dec;10(6):317-25. doi: 10.1016/j.gpb.2012.06.006. Epub 2012 Nov 29. Genomics Proteomics Bioinformatics. 2012. PMID: 23317699 Free PMC article. Review.
See all similar articles

Cited by 35 articles

See all "Cited by" articles


    1. Pruitt KD, Harrow J, Harte RA, et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009;19:1316–1323. - PMC - PubMed
    1. Flicek P, Amode MR, Barrell D, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–D90. - PMC - PubMed
    1. Wilming LG, Gilbert JGR, Howe K, et al. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008;36:D753–D760. - PMC - PubMed
    1. Harrow J, Denoeud F, Frankish A, et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7:S4. - PMC - PubMed
    1. Pruitt KD, Tatusova T, Brown GR, et al. NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucl. Acids Res. 2012;40:D130–D135. - PMC - PubMed

Publication types