Proteogenomics to discover the full coding content of genomes: a computational perspective

Natalie Castellana; Vineet Bafna

doi:10.1016/j.jprot.2010.06.007

Proteogenomics to discover the full coding content of genomes: a computational perspective

J Proteomics. 2010 Oct 10;73(11):2124-35. doi: 10.1016/j.jprot.2010.06.007. Epub 2010 Jul 8.

Authors

Natalie Castellana¹, Vineet Bafna

Affiliation

¹ Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA.

Abstract

Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Review

MeSH terms

Animals
Computational Biology / methods*
Computational Biology / trends
Genome / genetics*
Humans
Open Reading Frames / genetics*
Proteome / genetics*
Proteomics / methods
Proteomics / trends

Substances

Proteome

Abstract

Publication types

MeSH terms

Substances

Grants and funding