Predicting protein-protein interactions from protein domains using a set cover approach

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):78-87. doi: 10.1109/TCBB.2007.1001.


One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Databases, Protein
  • Gene Expression Profiling
  • Internet
  • Likelihood Functions
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis
  • Proteasome Endopeptidase Complex / chemistry
  • Proteasome Endopeptidase Complex / genetics
  • Proteasome Endopeptidase Complex / metabolism
  • Protein Interaction Mapping / methods
  • Protein Structure, Tertiary*
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism
  • Proteomics / methods*
  • RNA Polymerase II / chemistry
  • RNA Polymerase II / genetics
  • RNA Polymerase II / metabolism
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / genetics
  • Saccharomyces cerevisiae Proteins / metabolism


  • Proteins
  • Saccharomyces cerevisiae Proteins
  • RNA Polymerase II
  • Proteasome Endopeptidase Complex
  • ATP dependent 26S protease