Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book

Nat Methods. 2004 Dec;1(3):195-202. doi: 10.1038/nmeth725.


Database searching is an essential element of large-scale proteomics. Because these methods are widely used, it is important to understand the rationale of the algorithms. Most algorithms are based on concepts first developed in SEQUEST and PeptideSearch. Four basic approaches are used to determine a match between a spectrum and sequence: descriptive, interpretative, stochastic and probability-based matching. We review the basic concepts used by most search algorithms, the computational modeling of peptide identification and current challenges and limitations of this approach for protein identification.

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Computer Simulation
  • Database Management Systems*
  • Databases, Protein*
  • Humans
  • Information Storage and Retrieval / methods*
  • Mass Spectrometry / methods*
  • Models, Chemical
  • Models, Statistical
  • Molecular Sequence Data
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteomics / methods
  • Sequence Alignment / methods
  • Sequence Analysis, Protein / methods*
  • User-Computer Interface


  • Proteins