Categorizing biases in high-confidence high-throughput protein-protein interaction data sets

Mol Cell Proteomics. 2011 Dec;10(12):M111.012500. doi: 10.1074/mcp.M111.012500. Epub 2011 Aug 29.


We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not necessarily imply that one detection methodology was better or worse, but rather that, to a large extent, the insights reflected the methodological biases themselves. Consequently, interpreting the protein interaction data within their experimental or cellular context provided the best avenue for overcoming biases and inferring biological knowledge.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Data Interpretation, Statistical*
  • Molecular Sequence Annotation
  • Multiprotein Complexes / metabolism
  • Protein Interaction Mapping / methods*
  • Protein Interaction Maps*
  • Protein Transport
  • Reproducibility of Results
  • Saccharomyces cerevisiae Proteins / metabolism
  • Statistics, Nonparametric
  • Transcription, Genetic


  • Multiprotein Complexes
  • Saccharomyces cerevisiae Proteins