Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

Andrew R Jones; Jennifer A Siepen; Simon J Hubbard; Norman W Paton

doi:10.1002/pmic.200800473

Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines

Proteomics. 2009 Mar;9(5):1220-9. doi: 10.1002/pmic.200800473.

Authors

Andrew R Jones¹, Jennifer A Siepen, Simon J Hubbard, Norman W Paton

Affiliation

¹ Department of Preclinical Veterinary Science, Faculty of Veterinary Science, University of Liverpool, Liverpool, UK. andrew.jones@liv.ac.uk

Abstract

LC-MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re-assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Databases, Protein
Information Storage and Retrieval*
Models, Statistical
Peptides / analysis*
Proteins / analysis
Proteomics / methods*
Reproducibility of Results
Software

Substances

Peptides
Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding