De novo derivation of proteomes from transcriptomes for transcript and protein identification

Nat Methods. 2012 Dec;9(12):1207-11. doi: 10.1038/nmeth.2227. Epub 2012 Nov 11.


Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry-based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenoviridae / genetics
  • Adenoviridae / metabolism
  • Animals
  • Arginine / metabolism
  • CHO Cells
  • Carbon Isotopes
  • Chromatography, Liquid
  • Cricetinae
  • Cricetulus
  • Databases, Protein
  • HeLa Cells
  • Humans
  • Lysine / metabolism
  • Nitrogen Isotopes
  • Nuclear Proteins / metabolism
  • Polymorphism, Single Nucleotide
  • Proteome / chemistry*
  • Proteomics / methods*
  • RNA, Messenger / metabolism
  • RNA-Binding Proteins / metabolism
  • Sequence Analysis, Protein / methods
  • Software
  • Tandem Mass Spectrometry / methods
  • Transcriptome*


  • Carbon Isotopes
  • Nitrogen Isotopes
  • Nuclear Proteins
  • POLDIP3 protein, human
  • Proteome
  • RNA, Messenger
  • RNA-Binding Proteins
  • Arginine
  • Lysine