Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10 (9), 3871-9

Faster SEQUEST Searching for Peptide Identification From Tandem Mass Spectra

Affiliations

Faster SEQUEST Searching for Peptide Identification From Tandem Mass Spectra

Benjamin J Diament et al. J Proteome Res.

Abstract

Computational analysis of mass spectra remains the bottleneck in many proteomics experiments. SEQUEST was one of the earliest software packages to identify peptides from mass spectra by searching a database of known peptides. Though still popular, SEQUEST performs slowly. Crux and TurboSEQUEST have successfully sped up SEQUEST by adding a precomputed index to the search, but the demand for ever-faster peptide identification software continues to grow. Tide, introduced here, is a software program that implements the SEQUEST algorithm for peptide identification and that achieves a dramatic speedup over Crux and SEQUEST. The optimization strategies detailed here employ a combination of algorithmic and software engineering techniques to achieve speeds up to 170 times faster than a recent version of SEQUEST that uses indexing. For example, on a single Xeon CPU, Tide searches 10,000 spectra against a tryptic database of 27,499 Caenorhabditis elegans proteins at a rate of 1550 spectra per second, which compares favorably with a rate of 8.8 spectra per second for a recent version of SEQUEST with index running on the same hardware.

Figures

Figure 1
Figure 1. Data flow in Tide before and after optimization
Figure 2
Figure 2
Profile of various development stages of Tide for the worm benchmark (10,000 spectra). Each profile shows how much computing time was spent in each of the major phases of Tide’s operation at various points during development. Such profiles aided in deciding how best to proceed with optimization efforts. Profiles shown are (a) Tide-v0; (b) before and after linearizing background subtraction (Supplement Section 3); (c) before and after fivefold sparser representation, and after storing d to disk (Supplement Section 7); and (d) the current version of Tide. For each plot, the (diminishing) total execution time is indicated via the y-axis scale.
Figure 3
Figure 3
Performance of Tide compared to SEQUEST, Crux, OMSSA, Indexed SEQUEST (11/2009), and X!Tandem. Performance was measured in eight settings, varying the percur-sor mass tolerance window, the digest (fully tryptic candidate peptides or semi-tryptic), and the dataset (C. elegans, “worm dataset” or S. cerevisiae, “yeast dataset”—see Methods). Bar heights in log scale show spectra processed per second, with numerical results given below. Each experiment was repeated at least 3 times with average timings shown, except for the X!Tandem experiments. Because SEQUEST runs relatively slowly, all SEQUEST experiments, as well as Crux experiments using semi-tryptic digestion, were performed with 100 randomly-selected spectra. The remaining experiments, including all Tide experiments, were performed using 10,000 benchmark spectra.
Figure 4
Figure 4
Performance of Tide compared to SEQUEST and Indexed SEQUEST (11/2009) on benchmark datasets with variable modifications. Bar heights in log scale show the number of spectra processed per second. The same benchmark datasets were used as in Figure 3, but with up to two occurrences per peptide of phosphorylated residues serine, threonine, or tyrosine. Tests were run with a ±3.0 Dalton mass window and full tryptic digestion. As in Figure 3, SEQUEST experiments were run with 100 randomly-selected spectra, whereas Tide experiments used 10,000 benchmark spectra.
Figure 5
Figure 5. Comparison ofXCorr scores from Tide and from two versions of SEQUEST
From two different data sets (yeast and worm), 100 spectra were selected at random for analysis by SEQUEST and by Tide. Searches were performed using a database of tryptic peptides from the corresponding organism, allowing up to two phosphorylations per peptide at occurrences of STY. The figure includes the top five PSMs per spectrum, as reported by SEQUEST. For each PSM, we plot the SEQUEST XCorr versus the XCorr computed by Tide. In the case of the bottom figures, we plot the SEQUEST 1993 XCorr scores against those computed by SEQUEST 2009.

Similar articles

See all similar articles

Cited by 43 articles

See all "Cited by" articles

Publication types

Substances

Feedback