Intensity-based protein identification by machine learning from a library of tandem mass spectra

Nat Biotechnol. 2004 Feb;22(2):214-9. doi: 10.1038/nbt930. Epub 2004 Jan 18.


Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.

Publication types

  • Evaluation Study
  • Letter
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Artificial Intelligence
  • Likelihood Functions
  • Mass Spectrometry / methods*
  • Molecular Sequence Data
  • Pattern Recognition, Automated
  • Peptide Library*
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteins / classification*
  • Proteomics / methods
  • Sequence Alignment / methods
  • Sequence Analysis, Protein / methods*


  • Peptide Library
  • Proteins