The decoy-database approach is currently the gold standard for assessing the confidence of identifications in shotgun proteomic experiments. Here, we demonstrate that what might appear to be a good result under the decoy-database approach for a given false-discovery rate could be, in fact, the product of overfitting. This problem has been overlooked until now and could lead to obtaining boosted identification numbers whose reliability does not correspond to the expected false-discovery rate. To overcome this, we are introducing a modified version of the method, termed a semi-labeled decoy approach, which enables the statistical determination of an overfitted result.
Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.