Low-coverage massively parallel pyrosequencing of cDNAs enables proteomics in non-model species: comparison of a species-specific database generated by pyrosequencing with databases from related species for proteome analysis of pea chloroplast envelopes

Andrea Bräutigam; Roshan P Shrestha; Doug Whitten; Curtis G Wilkerson; Kevin M Carr; John E Froehlich; Andreas P M Weber

doi:10.1016/j.jbiotec.2008.02.007

Low-coverage massively parallel pyrosequencing of cDNAs enables proteomics in non-model species: comparison of a species-specific database generated by pyrosequencing with databases from related species for proteome analysis of pea chloroplast envelopes

J Biotechnol. 2008 Aug 31;136(1-2):44-53. doi: 10.1016/j.jbiotec.2008.02.007. Epub 2008 Feb 17.

Authors

Andrea Bräutigam¹, Roshan P Shrestha, Doug Whitten, Curtis G Wilkerson, Kevin M Carr, John E Froehlich, Andreas P M Weber

Affiliation

¹ Institut für Biochemie der Pflanzen, Heinrich-Heine-Universität, Universitätsstrasse 1, D-40225 Düsseldorf, Germany.

PMID: 18394738
DOI: 10.1016/j.jbiotec.2008.02.007

Abstract

Proteomics is a valuable tool for establishing and comparing the protein content of defined tissues, cell types, or subcellular structures. Its use in non-model species is currently limited because the identification of peptides critically depends on sequence databases. In this study, we explored the potential of a preliminary cDNA database for the non-model species Pisum sativum created by a small number of massively parallel pyrosequencing (MPSS) runs for its use in proteomics and compared it to comprehensive cDNA databases from Medicago truncatula and Arabidopsis thaliana created by Sanger sequencing. Each database was used to identify proteins from a pea leaf chloroplast envelope preparation. It is shown that the pea database identified more proteins with higher accuracy, although the sequence quality was low and the sequence contigs were short compared to databases from model species. Although the number of identified proteins in non-species-specific databases could potentially be increased by lowering the threshold for successful protein identifications, this strategy markedly increases the number of wrongly identified proteins. The identification rate with non-species-specific databases correlated with spectral abundance but not with the predicted membrane helix content, and strong conservation is necessary but not sufficient for protein identification with a non-species-specific database. It is concluded that massively parallel sequencing of cDNAs substantially increases the power of proteomics in non-model species.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Base Sequence
Cell Membrane / genetics*
Chromosome Mapping / methods
DNA, Chloroplast / genetics*
Database Management Systems
Databases, Genetic
Genome, Plant / genetics*
Molecular Sequence Data
Open Reading Frames / genetics
Pisum sativum / genetics*
Plant Proteins / genetics*
Proteome / genetics*
Sequence Analysis, DNA / methods*

Substances

DNA, Chloroplast
Plant Proteins
Proteome