Motivation: In model organisms such as yeast, large databases of protein-protein and protein-DNA interactions have become an extremely important resource for the study of protein function, evolution, and gene regulatory dynamics. In this paper we demonstrate that by integrating these interactions with widely-available mRNA expression data, it is possible to generate concrete hypotheses for the underlying mechanisms governing the observed changes in gene expression. To perform this integration systematically and at large scale, we introduce an approach for screening a molecular interaction network to identify active subnetworks, i.e., connected regions of the network that show significant changes in expression over particular subsets of conditions. The method we present here combines a rigorous statistical measure for scoring subnetworks with a search algorithm for identifying subnetworks with high score.
Results: We evaluated our procedure on a small network of 332 genes and 362 interactions and a large network of 4160 genes containing all 7462 protein-protein and protein-DNA interactions in the yeast public databases. In the case of the small network, we identified five significant subnetworks that covered 41 out of 77 (53%) of all significant changes in expression. Both network analyses returned several top-scoring subnetworks with good correspondence to known regulatory mechanisms in the literature. These results demonstrate how large-scale genomic approaches may be used to uncover signalling and regulatory pathways in a systematic, integrative fashion.