Concerted action of the new Genomic Peptide Finder and AUGUSTUS allows for automated proteogenomic annotation of the Chlamydomonas reinhardtii genome

Proteomics. 2011 May;11(9):1814-23. doi: 10.1002/pmic.201000621. Epub 2011 Mar 22.

Abstract

The use and development of post-genomic tools naturally depends on large-scale genome sequencing projects. The usefulness of post-genomic applications is dependent on the accuracy of genome annotations, for which the correct identification of intron-exon borders in complex genomes of eukaryotic organisms is often an error-prone task. Although automated algorithms for predicting intron-exon structures are available, supporting exon evidence is necessary to achieve comprehensive genome annotation. Besides cDNA and EST support, peptides identified via MS/MS can be used as extrinsic evidence in a proteogenomic approach. We describe an improved version of the Genomic Peptide Finder (GPF), which aligns de novo predicted amino acid sequences to the genomic DNA sequence of an organism while correcting for peptide sequencing errors and accounting for the possibility of splicing. We have coupled GPF and the gene finding program AUGUSTUS in a way that provides automatic structural annotations of the Chlamydomonas reinhardtii genome, using highly unbiased GPF evidence. A comparison of the AUGUSTUS gene set incorporating GPF evidence to the standard JGI FM4 (Filtered Models 4) gene set reveals 932 GPF peptides that are not contained in the Filtered Models 4 gene set. Furthermore, the GPF evidence improved the AUGUSTUS gene models by altering 65 gene models and adding three previously unidentified genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Base Sequence
  • Chlamydomonas reinhardtii / genetics*
  • Chlamydomonas reinhardtii / metabolism*
  • Computational Biology / methods*
  • Databases, Genetic
  • Exons / genetics
  • Genome, Plant / genetics
  • Genomics / methods*
  • Introns / genetics
  • Mass Spectrometry
  • Molecular Sequence Data
  • Peptides / analysis
  • Peptides / genetics
  • Plant Proteins / analysis
  • Plant Proteins / genetics
  • Proteomics / methods*
  • RNA Splice Sites / genetics
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Peptides
  • Plant Proteins
  • RNA Splice Sites