Machine learning methods for learning network structure are applied to quantitative proteomics experiments and reverse-engineer intracellular signal transduction networks. They provide insight into the rewiring of signaling within the context of a disease or a phenotype. To learn the causal patterns of influence between proteins in the network, the methods require experiments that include targeted interventions that fix the activity of specific proteins. However, the interventions are costly and add experimental complexity. We describe an active learning strategy for selecting optimal interventions. Our approach takes as inputs pathway databases and historic data sets, expresses them in form of prior probability distributions on network structures, and selects interventions that maximize their expected contribution to structure learning. Evaluations on simulated and real data show that the strategy reduces the detection error of validated edges as compared with an unguided choice of interventions and avoids redundant interventions, thereby increasing the effectiveness of the experiment.
Keywords: Bayesian network; active learning; biological networks; causal inference; machine learning..