Detection of locally over-represented GO terms in protein-protein interaction networks

J Comput Biol. 2010 Mar;17(3):443-57. doi: 10.1089/cmb.2009.0165.


High-throughput methods for identifying protein-protein interactions produce increasingly complex and intricate interaction networks. These networks are extremely rich in information, but extracting biologically meaningful hypotheses from them and representing them in a human-readable manner is challenging. We propose a method to identify Gene Ontology terms that are locally over-represented in a subnetwork of a given biological network. Specifically, we propose several methods to evaluate the degree of clustering of proteins associated to a particular GO term in both weighted and unweighted PPI networks, and describe efficient methods to estimate the statistical significance of the observed clustering. We show, using Monte Carlo simulations, that our best approximation methods accurately estimate the true p-value, for random scale-free graphs as well as for actual yeast and human networks. When applied to these two biological networks, our approach recovers many known complexes and pathways, but also suggests potential functions for many subnetworks. Online Supplementary Material is available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Genes*
  • Humans
  • Protein Binding
  • Protein Interaction Mapping*
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / metabolism
  • Time Factors


  • Saccharomyces cerevisiae Proteins