Motivation: Protein-protein interactions (PPIs) are pivotal for many biological processes and similarity in Gene Ontology (GO) annotation has been found to be one of the strongest indicators for PPI. Most GO-driven algorithms for PPI inference combine machine learning and semantic similarity techniques. We introduce the concept of inducers as a method to integrate both approaches more effectively, leading to superior prediction accuracies.
Results: An inducer (ULCA) in combination with a Random Forest classifier compares favorably to several sequence-based methods, semantic similarity measures and multi-kernel approaches. On a newly created set of high-quality interaction data, the proposed method achieves high cross-species prediction accuracies (Area under the ROC curve ≤ 0.88), rendering it a valuable companion to sequence-based methods.
Availability: Software and datasets are available at http://bioinformatics.org.au/go2ppi/