Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct;5(10):873-5.
doi: 10.1038/nmeth.1254. Epub 2008 Sep 21.

Building consensus spectral libraries for peptide identification in proteomics

Affiliations

Building consensus spectral libraries for peptide identification in proteomics

Henry Lam et al. Nat Methods. 2008 Oct.

Abstract

Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic diagram showing the various library building functionalites of SpectraST. Pertinent file formats are given in parentheses.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 2
Figure 2
An example of consensus spectrum building. (a–f) Raw replicate spectra assigned to the same peptide ion SITLFVQEDR (charge +2) by SEQUEST at probabilities above 0.9. (g) Resulting consensus spectrum created for this peptide ion by SpectraST. Solid lines: annotated peaks (annotations shown for common ions); Dotted lines: unannotated peaks. Various quality measures of the replicates are listed in Table 2. All 6 replicates are from the same dataset HUPOPPP34/HUPO34_b1-SERUM, acquired on a ThermoFinnigan LTQ at a collision energy of 25.
Figure 3
Figure 3
Reduction of noise after consensus creation, by the number of replicates used. The average peak reduction factor (bars, left axis) is the average, over all library entries in that bin, of the peak reduction factor, which is defined as the average number of peaks in the replicate spectra divided by that in the consensus spectrum. The average fraction of peaks annotated in consensus (line, right axis) is the average, over all library entries in that bin, of the fraction of peaks that are annotated in the consensus spectrum. Note also that the average fraction of annotated peaks in the raw replicate spectra is about 42% (not shown in the figure).
Figure 4
Figure 4
Venn diagram of quality-filtered spectra. The three categories of questionable spectra (Impure, Conflicting ID, and Single) as determined by SpectraST are described in the Experimental Procedure Section.
Figure 5
Figure 5
Receiver operator characteristic (ROC) curves for SpectraST searches against consensus spectral libraries of three different quality levels – Q0 (squares), Q1 (triangles), Q2 (circles, solid curve) and against a best-replicate spectral library Q2-BR (circles, dotted curve), of all 40 datasets used in the study, as estimated by PeptideProphet.
Figure 6
Figure 6
Average fraction of scaled intensity retained at different maximum number of peaks retained per library spectrum, across all spectra in the Q2 library. Scaled intensity is defined as the square root of the raw intensity; it is the measure used to calculate dot products during spectral searching (Ref 11). Error bars represent one standard deviation of values calculated for all spectra in the Q2 library.
Figure 7
Figure 7
Receiver operating characteristic (ROC) curves for the 3 SpectraST searches illustrating the effect of library spectrum simplification, against consensus spectral libraries at three maximum number of peaks retained – Q2 (full spectra retained, circles), Q2-20p (top 20 peaks retained, diamonds), Q2-50p (top 50 peaks retained, triangles), of all 40 datasets used in the study, as estimated by PeptideProphet.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Sadygov RG, Cociorva D, Yates JR., III Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods. 2004;1:195–202. - PubMed
    1. Steen H, Mann M. The ABC’s (and XYZ’s) of peptide sequencing. Nat Rev Mol Cell Biol. 2004;5:699–711. - PubMed
    1. Eng JK, McCormack AL, Yates JR., III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed
    1. Domon B, Aebersold R. Challenges and opportunities in proteomics data analysis. Mol Cell Proteomics. 2006;5:1921–1926. - PubMed

Publication types

Substances