Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 30;8(9):3059-3068.
doi: 10.1534/g3.118.200571.

GWAS With Heterogeneous Data: Estimating the Fraction of Phenotypic Variation Mediated by Gene Expression Data

Affiliations
Free PMC article

GWAS With Heterogeneous Data: Estimating the Fraction of Phenotypic Variation Mediated by Gene Expression Data

Eriko Sasaki et al. G3 (Bethesda). .
Free PMC article

Abstract

Intermediate phenotypes such as gene expression values can be used to elucidate the mechanisms by which genetic variation causes phenotypic variation, but jointly analyzing such heterogeneous data are far from trivial. Here we extend a so-called mediation model to handle the confounding effects of genetic background, and use it to analyze flowering time variation in Arabidopsis thaliana, focusing in particular on the central role played by the key regulator FLOWERING TIME LOCUS C (FLC). FLC polymorphism and FLC expression are both strongly correlated with flowering time variation, but the effect of the former is only partly mediated through the latter. Furthermore, the latter also reflects genetic background effects. We demonstrate that it is possible to partition these effects, shedding light on the complex regulatory network that underlies flowering time variation.

Keywords: FLC; correlation network; genetic architecture; mediation analysis, flowering time; natural variation.

Figures

Figure 1
Figure 1
A genotype-phenotype model that includes gene expression. The phenotype is affected by a genetic polymorphism that is partly mediated by the expression of a nearby gene, resulting in a direct and indirect genetic effect. Both gene expression and phenotype are also affected by confounding genetic background.
Figure 2
Figure 2
Correlation between flowering time and gene expression levels in the Swedish population. (A) The significance of the GO enrichment for flowering time genes (and implied FDR; see Methods) as function of the significance threshold for the flowering-expression correlation. (B) Outline of the flowering pathways in A. thaliana (reviewed in, e.g., Kim et al. 2009; Wellmer and Riechmann 2010; Srikanth and Schmid 2011). FLC represses the floral integrator genes FD, FT, and SOC1. FT is induced by the photoperiod pathway through CONSTANS (CO), which is induced by CRYPTOCROMEs (CRYs); the FT protein is a mobile flowering signal that works with FD to induce SOC1 and floral meristem genes including APETALA1 (AP1), FRUITFUL (FUL), and SEPALATA (SPL3). AGL24 and SOC1 regulate each other in positive feedback loops and induce transcription of LFY. The gibberellin pathway promotes flowering by inducing SOC1 and the floral meristem-identity gene LEAFY (LFY). (C) A correlation network based on gene expression levels. Nodes show flowering time (yellow) and the genes in Table 1 (blue, or orange for the a priori gene set). Edges show significant correlations between nodes (with Bonferroni correction to control FWER at α=0.01) in pink or blue (for positive and negative correlations, respectively).
Figure 3
Figure 3
Genetic effects on gene expression levels. Effects of local genetic variation were estimated using a variance component analysis and 30-kb windows surrounding each gene in Table 1. The lower panel shows the fraction of expression variation explained by local genetic variation surrounding each gene (cis-effects are along the diagonal), and the top panel shows number of associations explaining more than 10% of the variation (cf. Table S2).
Figure 4
Figure 4
GWAS for flowering time (A) and FLC expression (B). Gray horizontal lines indicate Bonferroni-corrected 5% significance thresholds and orange arrows in panel A show a priori flowering time genes (from Sasaki et al. 2015); the arrow in B shows the SNP in the FLC region identified in A.
Figure 5
Figure 5
Mediation analysis of flowering time regulation by FLC. (A) Models used. The full model correcting for genetic background is shown on top (LMM, linear mixed-model), and the model without such a correction is shown below (LM, linear model). For details see text. Estimates are shown in blue. (B) Proportion of flowering time variation (r2) explained by SNPFLC and FLC expression under the two models (see text). (C) QQ plots of genome-wide association for flowering time and FLC expression with (blue line) and without (red line) correcting for population structure. (D) The SNPFLC effect that is mediated by expression of each of the genes in Table 1. Red bars indicate that effect is significant (p<0.05).
Figure 6
Figure 6
Prediction of flowering time. (A) Top: A scatter plot between flowering time and the expression level of FLC, both at 10°C, with histograms for each phenotype illustrating the effect of SNPFLC. Reference and non-reference alleles are shown in blue and red, respectively. The dashed lines are regression lines for each allele. Bottom: predicted vs. observed flowering time. (B) The 10°C model applied to the same population grown at 16°C. (C) The 10°C model applied to a different population grown in the greenhouse. Dashed lines in model fits show 95% confidence intervals.

Similar articles

See all similar articles

Cited by 1 article

References

    1. Andrés F., Coupland G., 2012. The genetic basis of flowering responses to seasonal cues. Nat. Rev. Genet. 13: 627–639. 10.1038/nrg3291 - DOI - PubMed
    1. Aranzana M. J., Kim S., Zhao K., Bakker E., Horton M., et al. , 2005. Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genet. 1: e60 10.1371/journal.pgen.0010060 - DOI - PMC - PubMed
    1. Atwell S., Huang Y. S., Vilhjálmsson B. J., Willems G., Horton M., et al. , 2010. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: 627–631. 10.1038/nature08800 - DOI - PMC - PubMed
    1. Barfield R., Feng H., Gusev A., Wu L., Zheng W., et al. , 2018. Transcriptome-wide association studies accounting for colocalization using Egger regression. bioRxiv. 10.1101/223263 - DOI - PMC - PubMed
    1. Baron R. M., Kenny D. A., 1986. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51: 1173–1182. 10.1037/0022-3514.51.6.1173 - DOI - PubMed

MeSH terms

LinkOut - more resources

Feedback