A refined method to calculate false discovery rates for peptide identification using decoy databases

J Proteome Res. 2009 Apr;8(4):1792-6. doi: 10.1021/pr800362h.


Using decoy databases to estimate the number of false positive assignations is one of the most widely used methods to calculate false discovery rates in large-scale peptide identification studies. However, in spite of their widespread use, the decoy approach has not been fully standardized. In conjunction with target databases, decoy databases may be used separately or in the form of concatenated databases, allowing a competition strategy; depending on the method used, two alternative formulations are possible to calculate error rates. Although both methods are conservative, the separate database approach overestimates the number of false positive assignations due to the presence of MS/MS spectra produced by true peptides, while the concatenated approach calculates the error rate in a population that has a higher size than that obtained after searching a target database. In this work, we demonstrate that by analyzing as a whole the joint distribution of matches obtained after performing a separate database search, and applying the competition strategy, it is possible to make a more accurate calculation of false discovery rates. We show that both separate and concatenated approaches clearly overestimate error rates with respect to those calculated by the new algorithm, using several kinds of scores. We conclude that the new indicator provides a more sensitive alternative, and establishes for the first time a unique and integrated framework to calculate error rates in large-scale peptide identification studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Databases, Protein*
  • False Positive Reactions
  • Peptides / analysis*
  • Software*


  • Peptides