Re-evaluation of the 18 non-human protein standards used to create the empirical statistical model for decoy library searching

Anal Biochem. 2020 Jun 15:599:113680. doi: 10.1016/j.ab.2020.113680. Epub 2020 Mar 16.

Abstract

The Empirical Statistical Model (ESM) for decoy library searching fused the expected amino acid sequence of 18 non-human protein standards to a human decoy library. The ESM assumed a priori the standards were pure such that only the 18 nominal proteins were true positive, all other proteins were false positive, there was no overlap in the peptides of non-human proteins versus human proteins, and that the score distribution of individual peptides would resolve true positive from false positive results or noise. The results of random and independent sampling by LC-ESI-MS/MS indicated that the fundamental assumptions of the ESM were not in good agreement with the actual purity of the commercial test standards and so the method showed a 99.7% false negative rate. The ESM for decoy library searching apparently showed poor agreement with SDS-PAGE using silver staining, goodness of fit of MS/MS spectra by X!TANDEM, FDR correction by Benjamini and Hochberg, or comparison to the observation frequency of null random MS/MS spectra, that all confirmed the standards contain hundreds of proteins with a low FDR of primary structural identification. The protein observation frequency increased with abundance and the log10 precursor intensity distributions were Gaussian and nearly ideal for relative quantification.

Keywords: LC-ESI-MS/MS; MS/MS correlation; Protein standards; SDS-PAGE; Silver stain; Type I error Rate.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Databases, Protein*
  • Humans
  • Proteins / standards*
  • Reference Standards
  • Tandem Mass Spectrometry

Substances

  • Proteins