Novel semantic similarity measure improves an integrative approach to predicting gene functional associations

BMC Syst Biol. 2013 Mar 14;7:22. doi: 10.1186/1752-0509-7-22.


Background: Elucidation of the direct/indirect protein interactions and gene associations is required to fully understand the workings of the cell. This can be achieved through the use of both low- and high-throughput biological experiments and in silico methods. We present GAP (Gene functional Association Predictor), an integrative method for predicting and characterizing gene functional associations. GAP integrates different biological features using a novel taxonomy-based semantic similarity measure in predicting and prioritizing high-quality putative gene associations. The proposed similarity measure increases information gain from the available gene annotations. The annotation information is incorporated from several public pathway databases, Gene Ontology annotations as well as drug and disease associations from the scientific literature.

Results: We evaluated GAP by comparing its prediction performance with several other well-known functional interaction prediction tools over a comprehensive dataset of known direct and indirect interactions, and observed significantly better prediction performance. We also selected a small set of GAP's highly-scored novel predicted pairs (i.e., currently not found in any known database or dataset), and by manually searching the literature for experimental evidence accessible in the public domain, we confirmed different categories of predicted functional associations with available evidence of interaction. We also provided extra supporting evidence for subset of the predicted functionally-associated pairs using an expert curated database of genes associated to autism spectrum disorders.

Conclusions: GAP's predicted "functional interactome" contains ≈1M highly-scored predicted functional associations out of which about 90% are novel (i.e., not experimentally validated). GAP's novel predictions connect disconnected components and singletons to the main connected component of the known interactome. It can, therefore, be a valuable resource for biologists by providing corroborating evidence for and facilitating the prioritization of potential direct or indirect interactions for experimental validation. GAP is freely accessible through a web portal:

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Child
  • Child Development Disorders, Pervasive / genetics
  • Child Development Disorders, Pervasive / metabolism
  • Computational Biology / methods*
  • Databases, Genetic
  • Humans
  • Protein Interaction Maps
  • Proteins / genetics*
  • Proteins / metabolism*


  • Proteins