Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model

Pac Symp Biocomput. 2006;231-42.


We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way. A supplementary web site containing larger versions of the figures is available at

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Computational Biology
  • Computer Simulation
  • DNA-Directed RNA Polymerases / chemistry
  • DNA-Directed RNA Polymerases / isolation & purification
  • Likelihood Functions
  • Mass Spectrometry
  • Models, Molecular
  • Multiprotein Complexes* / chemistry
  • Multiprotein Complexes* / isolation & purification
  • Saccharomyces cerevisiae Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / isolation & purification


  • Multiprotein Complexes
  • Saccharomyces cerevisiae Proteins
  • DNA-Directed RNA Polymerases