Learning gene networks under SNP perturbations using eQTL datasets

PLoS Comput Biol. 2014 Feb 27;10(2):e1003420. doi: 10.1371/journal.pcbi.1003420. eCollection 2014 Feb.

Abstract

The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Computational Biology
  • Computer Simulation
  • DNA, Fungal / genetics
  • Databases, Nucleic Acid*
  • Gene Expression Profiling
  • Gene Regulatory Networks*
  • Humans
  • Linear Models
  • Models, Genetic
  • Models, Statistical
  • Polymorphism, Single Nucleotide*
  • Quantitative Trait Loci*
  • Saccharomyces cerevisiae / genetics

Substances

  • DNA, Fungal

Grants and funding

This material is based upon work supported by an NSF CAREER Award under grant No. MCB-1149885, Sloan Research Fellowship, and Okawa Foundation Research Grant. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.