Computing exact p-values for a cross-correlation shotgun proteomics score function

Mol Cell Proteomics. 2014 Sep;13(9):2467-79. doi: 10.1074/mcp.O113.036327. Epub 2014 Jun 2.


The core of every protein mass spectrometry analysis pipeline is a function that assesses the quality of a match between an observed spectrum and a candidate peptide. We describe a procedure for computing exact p-values for the oldest and still widely used score function, SEQUEST XCorr. The procedure uses dynamic programming to enumerate efficiently the full distribution of scores for all possible peptides whose masses are close to that of the spectrum precursor mass. Ranking identified spectra by p-value rather than XCorr significantly reduces variance because of spectrum-specific effects on the score. In combination with the Percolator postprocessor, the XCorr p-value yields more spectrum and peptide identifications at a fixed false discovery rate than Mascot, X!Tandem, Comet, and MS-GF+ across a variety of data sets.

MeSH terms

  • Algorithms
  • Caenorhabditis elegans Proteins / metabolism
  • Databases, Protein
  • Humans
  • Myocardium / metabolism
  • Peptides / chemistry
  • Proteomics / statistics & numerical data*
  • Saccharomyces cerevisiae Proteins / metabolism


  • Caenorhabditis elegans Proteins
  • Peptides
  • Saccharomyces cerevisiae Proteins