False Discovery Rate Estimation in Proteomics

Suruchi Aggarwal; Amit Kumar Yadav

doi:10.1007/978-1-4939-3106-4_7

False Discovery Rate Estimation in Proteomics

Methods Mol Biol. 2016:1362:119-28. doi: 10.1007/978-1-4939-3106-4_7.

Authors

Suruchi Aggarwal¹, Amit Kumar Yadav²

Affiliations

¹ Immunology Group, International Centre for Genetic Engineering and Biotechnology, ICGEB Campus, Aruna Asaf Ali Marg, New Delhi, 110067, India.
² Drug Discovery Research Center (DDRC), Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd Milestone, Faridabad-Gurgaon Expressway, Faridabad, 122001, Haryana, India. amit.yadav@thsti.res.in.

PMID: 26519173
DOI: 10.1007/978-1-4939-3106-4_7

Abstract

With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.

Keywords: False discovery rate; Peptide spectrum matches; Posterior error probability; Shotgun proteomics; Statistical validation; Target-decoy.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Models, Statistical
Proteomics / methods*
Proteomics / standards*
Reproducibility of Results