Rapid and accurate peptide identification from tandem mass spectra

J Proteome Res. 2008 Jul;7(7):3022-7. doi: 10.1021/pr800127y. Epub 2008 May 28.

Abstract

Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computational Biology
  • Databases, Factual
  • Humans
  • Peptide Fragments / analysis
  • Peptides / analysis*
  • Proteomics
  • Saccharomyces cerevisiae Proteins / analysis
  • Software
  • Tandem Mass Spectrometry

Substances

  • Peptide Fragments
  • Peptides
  • Saccharomyces cerevisiae Proteins