From pull-down data to protein interaction networks and complexes with biological relevance

Bioinformatics. 2008 Apr 1;24(7):979-86. doi: 10.1093/bioinformatics/btn036. Epub 2008 Feb 26.


Motivation: Recent improvements in high-throughput Mass Spectrometry (MS) technology have expedited genome-wide discovery of protein-protein interactions by providing a capability of detecting protein complexes in a physiological setting. Computational inference of protein interaction networks and protein complexes from MS data are challenging. Advances are required in developing robust and seamlessly integrated procedures for assessment of protein-protein interaction affinities, mathematical representation of protein interaction networks, discovery of protein complexes and evaluation of their biological relevance.

Results: A multi-step but easy-to-follow framework for identifying protein complexes from MS pull-down data is introduced. It assesses interaction affinity between two proteins based on similarity of their co-purification patterns derived from MS data. It constructs a protein interaction network by adopting a knowledge-guided threshold selection method. Based on the network, it identifies protein complexes and infers their core components using a graph-theoretical approach. It deploys a statistical evaluation procedure to assess biological relevance of each found complex. On Saccharomyces cerevisiae pull-down data, the framework outperformed other more complicated schemes by at least 10% in F(1)-measure and identified 610 protein complexes with high-functional homogeneity based on the enrichment in Gene Ontology (GO) annotation. Manual examination of the complexes brought forward the hypotheses on cause of false identifications. Namely, co-purification of different protein complexes as mediated by a common non-protein molecule, such as DNA, might be a source of false positives. Protein identification bias in pull-down technology, such as the hydrophilic bias could result in false negatives.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biology / methods
  • Computer Simulation
  • Databases, Protein*
  • Gene Expression Profiling / methods*
  • Information Storage and Retrieval / methods
  • Models, Biological*
  • Peptide Mapping / methods
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Signal Transduction / physiology*
  • Structure-Activity Relationship
  • Systems Integration


  • Proteins