Proteogenomic Methods to Improve Genome Annotation

Methods Mol Biol. 2016:1410:77-89. doi: 10.1007/978-1-4939-3524-6_5.

Abstract

Annotation of protein coding genes in sequenced genomes has been routinely carried out using gene prediction programs guided by available transcript data. The advent of mass spectrometry has enabled the identification of proteins in a high-throughput manner. In addition to searching proteins annotated in public databases, mass spectrometry data can also be searched against conceptually translated genome as well as transcriptome to identify novel protein coding regions. This proteogenomics approach has resulted in the identification of novel protein coding regions in both prokaryotic and eukaryotic genomes. These studies have also revealed that some of the annotated noncoding RNAs and pseudogenes code for proteins. This approach is likely to become a part of most genome annotation workflows in the future. Here we describe a general methodology and approach that can be used for proteogenomics.

Keywords: Mass spectrometry; Noncoding RNAs; Novel proteins; Proteogenomics; Pseudogenes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genomics / methods
  • Humans
  • Molecular Sequence Annotation / methods
  • Open Reading Frames / genetics
  • Proteogenomics / methods*
  • Proteomics / methods