Low-coverage massively parallel pyrosequencing of cDNAs enables proteomics in non-model species: comparison of a species-specific database generated by pyrosequencing with databases from related species for proteome analysis of pea chloroplast envelopes

J Biotechnol. 2008 Aug 31;136(1-2):44-53. doi: 10.1016/j.jbiotec.2008.02.007. Epub 2008 Feb 17.


Proteomics is a valuable tool for establishing and comparing the protein content of defined tissues, cell types, or subcellular structures. Its use in non-model species is currently limited because the identification of peptides critically depends on sequence databases. In this study, we explored the potential of a preliminary cDNA database for the non-model species Pisum sativum created by a small number of massively parallel pyrosequencing (MPSS) runs for its use in proteomics and compared it to comprehensive cDNA databases from Medicago truncatula and Arabidopsis thaliana created by Sanger sequencing. Each database was used to identify proteins from a pea leaf chloroplast envelope preparation. It is shown that the pea database identified more proteins with higher accuracy, although the sequence quality was low and the sequence contigs were short compared to databases from model species. Although the number of identified proteins in non-species-specific databases could potentially be increased by lowering the threshold for successful protein identifications, this strategy markedly increases the number of wrongly identified proteins. The identification rate with non-species-specific databases correlated with spectral abundance but not with the predicted membrane helix content, and strong conservation is necessary but not sufficient for protein identification with a non-species-specific database. It is concluded that massively parallel sequencing of cDNAs substantially increases the power of proteomics in non-model species.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Base Sequence
  • Cell Membrane / genetics*
  • Chromosome Mapping / methods
  • DNA, Chloroplast / genetics*
  • Database Management Systems
  • Databases, Genetic
  • Genome, Plant / genetics*
  • Molecular Sequence Data
  • Open Reading Frames / genetics
  • Pisum sativum / genetics*
  • Plant Proteins / genetics*
  • Proteome / genetics*
  • Sequence Analysis, DNA / methods*


  • DNA, Chloroplast
  • Plant Proteins
  • Proteome