Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 23;10(1):3834.
doi: 10.1038/s41467-019-11874-7.

Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits

Affiliations

Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits

Wen Zhang et al. Nat Commun. .

Abstract

Transcriptome-wide association studies integrate gene expression data with common risk variation to identify gene-trait associations. By incorporating epigenome data to estimate the functional importance of genetic variation on gene expression, we generate a small but significant improvement in the accuracy of transcriptome prediction and increase the power to detect significant expression-trait associations. Joint analysis of 14 large-scale transcriptome datasets and 58 traits identify 13,724 significant expression-trait associations that converge on biological processes and relevant phenotypes in human and mouse phenotype databases. We perform drug repurposing analysis and identify compounds that mimic, or reverse, trait-specific changes. We identify genes that exhibit agonistic pleiotropy for genetically correlated traits that converge on shared biological pathways and elucidate distinct processes in disease etiopathogenesis. Overall, this comprehensive analysis provides insight into the specificity and convergence of gene expression on susceptibility to complex traits.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Comparison of prediction performance between EpiXcan and PrediXcan. EpiXcan and PrediXcan models are trained across multiple tissues that include: brain, aorta, mammary artery, subcutaneous fat, visceral fat, liver, skeletal muscle, and blood by leveraging 14 datasets from CMC, STARNET and GTEx. The difference in training performance between EpiXcan and PrediXcan models is compared using the adjusted cross validation R2 (R2CV) metric. The 14 models are further assessed by estimating the predictive performance (R2PP) in independent datasets; the training dataset is shown before the arrow and the test dataset after the arrow (G = GTEx and S = STARNET). For a given dataset, we compare the R2CV and R2PP by estimating the delta value of EpiXcan minus PrediXcan for each gene. Positive and negative delta values indicate genes with higher predictive performance in EpiXcan and PrediXcan, respectively. These genes are assigned as “EpiXcan” and “PrediXcan” and counts are shown as barplots. The number on the right indicates the ratio of “EpiXcan” assigned gene counts divided by “PrediXcan” counts. Across all datasets, the ratios are higher than 1 indicating that EpiXcan outperforms PrediXcan. p value from one-sample sign test indicates that the shift of the delta R2CV and R2PP values is greater than zero (All p values < 9 × 10−16)
Fig. 2
Fig. 2
Comparison of gene-trait associations between EpiXcan and PrediXcan. a EpiXcan and PrediXcan pairwise Wilcoxon test p value distributions for all gene-trait associations. Quantile-quantile (QQ) plot of the p values for all gene-trait associations show a significant, albeit modest, shift to the left. The genomic inflation factor (λ) is slightly higher for EpiXcan than PrediXcan (1.17 and 1.16). The two distributions are significantly different (Kolmogorov-Smirnov test p value is 3.3 × 10−16) and EpiXcan achieves an 8.47% improvement in effective sample size for common predictions based on χ2 test percentage improvement. b EpiXcan and PrediXcan have a high correlation of gene-trait association z scores. Scatter plot of EpiXcan and PrediXcan Z values, Pearson r = 0.92 and Spearman ρ = 0.91, p value < 2.22 × 10−16 for both. Only z values between −10 and 10 are plotted. The dotted blue line corresponds to y=x. c Gene set enrichment analysis (GSEA) for extremely loss-of-function intolerant (pLI ≥ 0.9) genes. Odds ratio with 95% CI are plotted for combined gene-trait associations from all traits and trait categories for enrichment in genes with pLI ≥ 0.9 (* for q value < 0.05). For all pLI decile bins enrichment refer to Supplementary Data 4. d EpiXcan has more power than PrediXcan to detect expression changes of trait-specific, clinically significant genes. These density plots depict the distribution of the Δ[z] (EpiXcan − PrediXcan) values for all gene-trait associations that are significant from either EpiXcan or PrediXcan. P value is from one sample sign test. Ratio is the number of Δ[z] measurements in favor of EpiXcan to that of PrediXcan. The red lines correspond to the mean of each distribution
Fig. 3
Fig. 3
Contribution of GWAS and tissues to gene-trait associations. a Correlation of genetically regulated expression imputed for different tissues (pooled GTAs for all traits). Correlation is calculated for significant imputed expression changes with the Spearman method. Dendrogram on the right edge is shown from Ward hierarchical clustering. b Enrichment of tissue-specificity of significant EpiXcan GTAs compared to a null model, where each tissue contributes equally (Pearson’s χ2 test p value = 2.7 × 10−8). Statistically, the enrichment is the Pearson standardized residual for each tissue-trait pair from the χ2 test. Box size and color indicate enrichment (red) or depletion (blue) for each tissue-trait pair. Only traits with expected frequency of more than 1 significant gene-trait association for each tissue model are evaluated as per Pearson’s χ2 test requirements. Tissues and traits are ordered based on Ward hierarchical clustering. Right-hand side panel indicates tissue-specificity enrichment score
Fig. 4
Fig. 4
Leveraging gene-trait associations for computational drug repurposing. a Trait-associated genes are used to sort a library of drug induced gene expression signatures according to their connectivity with the trait. GReX: genetically regulated expression. b A secondary enrichment analysis on this drug list identifies pharmacological features that are over-represented at the extreme ends of the sorted list, thus presenting a chemogenomic view of the trait. c Drug targets linked with each trait (FDR < 0.1) are then (d) compared with risk loci genes for a range of diseases or phenotypes (FDR < 0.1). e Top 10 compounds predicted to normalize the expression of “Hip adjusted BMI” associated genes. f Subset of side-effect enrichments for phenotypically related traits. g Subset of traits with associated drug targets that are enriched for risk associated genes sets with phenotypically related traits
Fig. 5
Fig. 5
Trait-trait correlations and gene-trait associations. a Network indicating shared genes within/across trait categories. Only traits that have more than 50 associated genes are showcased. Edge width denotes number of shared genes for each trait pair. The node size indicates number of gene-trait associations for a given trait. Blue edges denote within-category trait associations and orange edges denote across-category trait associations. The analysis is based on significantly associated genes with FDR ≤0.5%. b Scatter plot of genetic correlation (rg) and genetically regulated gene expression (rGReX) for each pairwise trait combination. Standard error is shown with gray lines, rg and rGReX are highly correlated (Pearson’s r = 0.8, p value < 2.79 × 10−126). c Causal trait network of CAD. CAD and up to two traits upstream are plotted in this network graph to demonstrate causal (arrows) and protective (bar-headed lines) relationships as estimated by bi-directional regression analysis. The trait nodes are colored based on the parent causal trait network of all the traits of the study (Supplementary Fig. 21); nodes that have more children than parent nodes are a darker shade of red and blue, respectively. In edges, width denotes absolute beta, redder color denotes lower p value, and the 2× or 3× labels denote that the relationship is identified in 2 or 3 tissues, respectively. The analysis is based on genes with FDR ≤1%, and only the relationships with p value ≤0.05 are shown. d Graph depicting the odds ratio of pathway enrichment for CAD agonistic genes shared with traits involved in the causal network. Briefly, for causal traits, a list of genes (with unadjusted p value ≤0.05) that are predicted to change to the same direction (or the opposite direction for protective traits) is used for GSEA for common pathways. In this graph only the top 15 (based on q value) results are shown and are ranked based on odds ratio; an asterisk (*) indicates results that have q value ≤0.05. Error bars represent 95% CI for each enrichment

Similar articles

Cited by

References

    1. Visscher PM, et al. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. - DOI - PMC - PubMed
    1. Farh KKH, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343. doi: 10.1038/nature13835. - DOI - PMC - PubMed
    1. Roussos P, et al. A role for noncoding variation in schizophrenia. Cell Rep. 2014;9:1417–1429. doi: 10.1016/j.celrep.2014.10.015. - DOI - PMC - PubMed
    1. Fullard JF, et al. An atlas of chromatin accessibility in the adult human brain. Genome Res. 2018;28:1243–1252. doi: 10.1101/gr.232488.117. - DOI - PMC - PubMed
    1. Hauberg ME, et al. Large-scale identification of common trait and disease variants affecting gene expression. Am. J. Hum. Genet. 2017;100:885–894. doi: 10.1016/j.ajhg.2017.04.016. - DOI - PMC - PubMed

Publication types