Background: Network query problem aligns a small query network with an arbitrarily large target network. The complexity of this problem grows exponentially with the number of nodes in the query network if confidence in the optimality of result is desired. Scaling this problem to large query and target networks remains to be a challenge.
Results: In this article, we develop a novel index structure that dramatically reduces the cost of the network query problem. Our index structure maintains a small set of reference networks where each reference network is a small, carefully chosen subnetwork from the target network. Along with each reference, we also store all of its non-overlapping and statistically significant alignments with the target network. Given a query network, we first align the query with the reference networks. If the alignment with a reference network yields a sufficiently large score, we compute an upper-bound to the alignment score between the query and the target using the alignments of that reference and the target (which is stored in our index). If the upper-bound is large enough, we employ a second round of alignment between the query and the target by respecting the mapping found in the first alignment. Our experiments on protein-protein interaction networks demonstrate that our index achieves a significant speed-up in running time over the state-of-the-art methods such as ColT. The alignment subnetworks obtained by our method are also statistically significant. Finally, we observe that our method finds biologically and statistically significant alignments across multiple species.
Conclusions: We developed a reference network based indexing structure that accelerates network query and produces functionally and statistically significant results.