Estimating and improving protein interaction error rates

Patrik D'haeseleer; George M Church

doi:10.1109/csb.2004.1332435

Estimating and improving protein interaction error rates

Proc IEEE Comput Syst Bioinform Conf. 2004:216-23. doi: 10.1109/csb.2004.1332435.

Authors

Patrik D'haeseleer¹, George M Church

Affiliation

¹ Lipper Center for Computational Genetics, Harvard Medical School, USA. patrik@genetics.med.harvard.edu

PMID: 16448015
DOI: 10.1109/csb.2004.1332435

Abstract

High throughput protein interaction data sets have proven to be notoriously noisy. Although it is possible to focus on interactions with higher reliability by using only those that are backed up by two or more lines of evidence, this approach invariably throws out the majority of available data. A more optimal use could be achieved by incorporating the probabilities associated with all available interactions into the analysis. We present a novel method for estimating error rates associated with specific protein interaction data sets, as well as with individual interactions given the data sets in which they appear. As a bonus, we also get an estimate for the total number of protein interactions in yeast. Certain types of false positive results can be identified and removed, resulting in a significant improvement in quality of the data set. For co-purification data sets, we show how we can reach a tradeoff between the "spoke" and "matrix" representation of interactions within co-purified groups of proteins to achieve an optimal false positive error rate.

Publication types

Comparative Study
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms*
Computer Simulation
Data Interpretation, Statistical
Gene Expression Profiling / methods*
Models, Biological*
Models, Statistical
Protein Interaction Mapping / methods*
Proteins / metabolism*
Reproducibility of Results
Sensitivity and Specificity
Two-Hybrid System Techniques

Substances

Proteins