Learning gene networks under SNP perturbations using eQTL datasets
- PMID: 24586125
- PMCID: PMC3937098
- DOI: 10.1371/journal.pcbi.1003420
Learning gene networks under SNP perturbations using eQTL datasets
Erratum in
- PLoS Comput Biol. 2014 Apr;10(4):e1003608
Abstract
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
's (
). The edges between gene-expression traits
's and SNPs
's (
) indicate the direct perturbations of the gene-expression traits by the given SNPs. The nodes for SNPs
's are shaded to show that the SNPs are conditioning variables in the conditional probability model. (B) Illustration of how the effects of the direct perturbation of the gene network by SNP
propagate through the gene network, as obtained by performing inference on sparse CGGM in Panel (A). While SNP
perturbs gene-expression traits
and
directly, this effect propagates through the network to perturb the expressions of other genes indirectly. The two directly perturbed genes
and
are shown as diamond-shaped nodes. The size and color-shade of each node indicate the strength of indirect perturbation of the given gene-expression trait by SNP
, with a larger and darker node for stronger perturbation. (C) The portion of the overall indirect SNP perturbation effects in Panel (B) that arose from the propagation of the direct perturbation of gene
by SNP
. (D) The portion of the overall indirect SNP perturbation effects in Panel (B) that arose from the propagation of the direct perturbation of gene
by SNP
. Within our statistical framework, we can perform inference on sparse CGGM in Panel (A) to obtain the indirect perturbations in Panel (B), and then decompose the indirect perturbations in Panel (B) into Panels (C) and (D) in a principled manner.
for gene-network edge weights,
for the strengths of direct SNP perturbations, and
for strengths of indirect SNP perturbations, respectively. In the middle and right columns,
and
are shown with gene-expression traits in rows and SNPs in columns. White pixels represent zero elements and darker pixels represent non-zero elements of the parameter matrix. The true model parameters are shown in Panel (A), and the estimated parameters are shown for (B) sparse CGGM, (C) MRCE, and (D) GFlasso. MRCE and GFlasso use the standard regression model for eQTL mapping, and thus provide a single summary of SNP effects on gene expressions in
. GFlasso focuses only on the task of eQTL mapping and thus, does not provide an estimate of gene network.
(rows) and
(columns). Each precision-recall curve was obtained as an average over results from 50 simulated datasets. Simulated datasets with 30 gene-expression traits and 500 SNPs were used.
(rows) and
(columns). For sparse CGGMs, each panel shows two precision-recall curves, one for eQTLs with direct perturbation effects
and another for indirect perturbation effects
, whereas for MRCE and GFlasso, the results are shown only for the association strengths
.
(rows) and
(columns).
(or
). (B) Precision-recall curves for the recovery of eQTLs in
. (C) Prediction errors. The results were obtained as an average over 50 simulated datasets. Simulated datasets with 30 gene-expression traits and 500 SNPs were used.
(or
). (B) Precision-recall curves for the recovery of eQTLs in
. (C) Prediction errors. The results were obtained as an average over 30 simulated datasets. Simulated datasets with 500 gene-expression traits and 1,000 SNPs were used.
with the number of SNPs fixed at
and (B) varying the number of SNPs
with the number of gene-expression traits fixed at
. The results for MRCE were obtained using the approximate algorithm.
. The diamond-shaped nodes represent gene-expression traits that are directly perturbed by the SNP, whereas the round and colored nodes represent those genes whose expressions are indirectly perturbed by the SNP. The color shade and size of nodes indicate the strength of the SNP perturbation of gene-expression trait. Our statistical framework allows the overall indirect SNP perturbation effects in Panel (A) to be decomposed into the components that arose from the propagation of the direct perturbation effects of each of (B) TFS1, (C) HSP26, (D) RTN2, and (E) GAD1 by the given SNP.
Similar articles
-
Inference of SNP-gene regulatory networks by integrating gene expressions and genetic perturbations.Biomed Res Int. 2014;2014:629697. doi: 10.1155/2014/629697. Epub 2014 Jun 9. Biomed Res Int. 2014. PMID: 25136606 Free PMC article.
-
Learning gene networks underlying clinical phenotypes using SNP perturbation.PLoS Comput Biol. 2020 Oct 23;16(10):e1007940. doi: 10.1371/journal.pcbi.1007940. eCollection 2020 Oct. PLoS Comput Biol. 2020. PMID: 33095769 Free PMC article.
-
Gene expression network reconstruction by convex feature selection when incorporating genetic perturbations.PLoS Comput Biol. 2010 Dec 2;6(12):e1001014. doi: 10.1371/journal.pcbi.1001014. PLoS Comput Biol. 2010. PMID: 21152011 Free PMC article.
-
From genetical genomics to systems genetics: potential applications in quantitative genomics and animal breeding.Mamm Genome. 2006 Jun;17(6):548-64. doi: 10.1007/s00335-005-0169-x. Epub 2006 Jun 12. Mamm Genome. 2006. PMID: 16783637 Free PMC article. Review.
-
Systems genetics, bioinformatics and eQTL mapping.Genetica. 2010 Oct;138(9-10):915-24. doi: 10.1007/s10709-010-9480-x. Epub 2010 Sep 3. Genetica. 2010. PMID: 20811929 Review.
Cited by
-
Bayesian network reconstruction using systems genetics data: comparison of MCMC methods.Genetics. 2015 Apr;199(4):973-89. doi: 10.1534/genetics.114.172619. Epub 2015 Jan 28. Genetics. 2015. PMID: 25631319 Free PMC article.
-
Variable Selection and Joint Estimation of Mean and Covariance Models with an Application to eQTL Data.Comput Math Methods Med. 2018 Jun 25;2018:4626307. doi: 10.1155/2018/4626307. eCollection 2018. Comput Math Methods Med. 2018. PMID: 30046352 Free PMC article.
-
Learning mixed graphical models with separate sparsity parameters and stability-based model selection.BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):175. doi: 10.1186/s12859-016-1039-0. BMC Bioinformatics. 2016. PMID: 27294886 Free PMC article.
-
Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks.Nat Commun. 2021 Jan 13;12(1):350. doi: 10.1038/s41467-020-20516-2. Nat Commun. 2021. PMID: 33441555 Free PMC article.
-
Graphical Model Selection for Gaussian Conditional Random Fields in the Presence of Latent Variables.J Am Stat Assoc. 2018 Jul 11;114(526):723-734. doi: 10.1080/01621459.2018.1434531. eCollection 2019. J Am Stat Assoc. 2018. PMID: 31391793 Free PMC article.
References
-
- Tong A, Evangelista M, Parsons A, Xu H, Bader G, et al. (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294: 2364–2368. - PubMed
-
- Hu Z, Killion P, Iyer V (2007) Genetic reconstruction of a functional transcriptional regulatory network. Nature Genetics 39: 683–687. - PubMed
-
- Jansen RC, Nap JP (2001) Genetical genomics: the added value from segregation. Trends in Genetics 17: 388–391. - PubMed
-
- Jansen R (2003) Studying complex biological systems using multifactorial perturbation. Nature Reviews Genetics 4: 145–151. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
