Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST

Anal Chem. 2004 Dec 1;76(23):6853-60. doi: 10.1021/ac049305c.


Recent technological advances have made multidimensional peptide separation techniques coupled with tandem mass spectrometry the method of choice for high-throughput identification of proteins. Due to these advances, the development of software tools for large-scale, fully automated, unambiguous peptide identification is highly necessary. In this work, we have used as a model the nuclear proteome from Jurkat cells and present a processing algorithm that allows accurate predictions of random matching distributions, based on the two SEQUEST scores Xcorr and DeltaCn. Our method permits a very simple and precise calculation of the probabilities associated with individual peptide assignments, as well as of the false discovery rate among the peptides identified in any experiment. A further mathematical analysis demonstrates that the score distributions are highly dependent on database size and precursor mass window and suggests that the probability associated with SEQUEST scores depends on the number of candidate peptide sequences available for the search. Our results highlight the importance of adjusting the filtering criteria to discriminate between correct and incorrect peptide sequences according to the circumstances of each particular experiment.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatography, Liquid / methods
  • Databases, Protein*
  • Electrophoresis, Gel, Two-Dimensional / methods
  • Humans
  • Jurkat Cells
  • Models, Statistical*
  • Peptide Fragments / chemistry*
  • Proteome / analysis
  • Sensitivity and Specificity
  • Software*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / instrumentation
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*


  • Peptide Fragments
  • Proteome