Negative protein-protein interaction datasets derived from large-scale two-hybrid experiments

Methods. 2012 Dec;58(4):343-8. doi: 10.1016/j.ymeth.2012.07.028. Epub 2012 Aug 4.


Negative protein-protein interaction datasets are needed for training and evaluation of interaction prediction methods, as well as validation of high-throughput interaction discovery experiments. In large-scale two-hybrid assays, the direct interaction of a large number of protein pairs is systematically probed. We present a simple method to harness two-hybrid data to obtain negative protein-protein interaction datasets, which we validated using other available experimental data. The method identifies interactions that were likely tested but not observed in a two-hybrid screen. For each negative interaction, a confidence score is defined as the shortest-path length between the two proteins in the interaction network derived from the two-hybrid experiment. We show that these high-quality negative datasets are particularly important when a specific biological context is considered, such as in the study of protein interaction specificity. We also illustrate the use of a negative dataset in the evaluation of the InterPreTS interaction prediction method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Area Under Curve
  • Computer Simulation
  • Evaluation Studies as Topic
  • Humans
  • Models, Biological
  • Protein Interaction Domains and Motifs
  • Protein Interaction Mapping / standards
  • Protein Interaction Maps*
  • ROC Curve
  • Reference Standards
  • Two-Hybrid System Techniques / standards*