False Discovery Rate Estimation in Proteomics

Methods Mol Biol. 2016:1362:119-28. doi: 10.1007/978-1-4939-3106-4_7.

Abstract

With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.

Keywords: False discovery rate; Peptide spectrum matches; Posterior error probability; Shotgun proteomics; Statistical validation; Target-decoy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Models, Statistical
  • Proteomics / methods*
  • Proteomics / standards*
  • Reproducibility of Results