Potential for false positive identifications from large databases through tandem mass spectrometry

J Proteome Res. Sep-Oct 2004;3(5):1082-5. doi: 10.1021/pr049946o.


The biomedical research community at large is increasingly employing shotgun proteomics for large-scale identification of proteins from enzymatic digests. Typically, the approach used to identify proteins and peptides from tandem mass spectral data is based on the matching of experimentally generated tandem mass spectra to the theoretical best match from a protein database. Here, we present the potential difficulties of using such an approach without statistical consideration of the false positive rate, especially when large databases, as are encountered in eukaryotes are considered. This is illustrated by searching a dataset generated from a multidimensional separation of a eukaryotic tryptic digest against an in silico generated random protein database, which generated a significant number of positive matches, even when previously suggested score filtering criteria are used.

Publication types

  • Letter
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / standards
  • Computational Biology / statistics & numerical data*
  • Databases, Protein*
  • Isoelectric Point
  • Male
  • Mass Spectrometry / standards
  • Mass Spectrometry / statistics & numerical data
  • Peptide Fragments / analysis
  • Proteins / analysis
  • Proteomics / statistics & numerical data*
  • Rats
  • Testis / chemistry


  • Peptide Fragments
  • Proteins