Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 17;167(5):1398-1414.e24.
doi: 10.1016/j.cell.2016.10.026.

Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells

Lu Chen  1 Bing Ge  2 Francesco Paolo Casale  3 Louella Vasquez  4 Tony Kwan  2 Diego Garrido-Martín  5 Stephen Watt  4 Ying Yan  4 Kousik Kundu  1 Simone Ecker  6 Avik Datta  7 David Richardson  7 Frances Burden  8 Daniel Mead  4 Alice L Mann  4 Jose Maria Fernandez  9 Sophia Rowlston  8 Steven P Wilder  10 Samantha Farrow  8 Xiaojian Shao  2 John J Lambourne  11 Adriana Redensek  2 Cornelis A Albers  12 Vyacheslav Amstislavskiy  13 Sofie Ashford  8 Kim Berentsen  14 Lorenzo Bomba  4 Guillaume Bourque  2 David Bujold  2 Stephan Busche  2 Maxime Caron  2 Shu-Huang Chen  2 Warren Cheung  2 Oliver Delaneau  15 Emmanouil T Dermitzakis  15 Heather Elding  4 Irina Colgiu  16 Frederik O Bagger  17 Paul Flicek  7 Ehsan Habibi  14 Valentina Iotchkova  18 Eva Janssen-Megens  14 Bowon Kim  14 Hans Lehrach  13 Ernesto Lowy  7 Amit Mandoli  14 Filomena Matarese  14 Matthew T Maurano  19 John A Morris  2 Vera Pancaldi  9 Farzin Pourfarzad  20 Karola Rehnstrom  8 Augusto Rendon  21 Thomas Risch  13 Nilofar Sharifi  14 Marie-Michelle Simon  2 Marc Sultan  13 Alfonso Valencia  9 Klaudia Walter  4 Shuang-Yin Wang  14 Mattia Frontini  22 Stylianos E Antonarakis  15 Laura Clarke  7 Marie-Laure Yaspo  13 Stephan Beck  23 Roderic Guigo  24 Daniel Rico  25 Joost H A Martens  14 Willem H Ouwehand  26 Taco W Kuijpers  27 Dirk S Paul  28 Hendrik G Stunnenberg  14 Oliver Stegle  3 Kate Downes  8 Tomi Pastinen  29 Nicole Soranzo  30
Affiliations
Free PMC article

Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells

Lu Chen et al. Cell. .
Free PMC article

Abstract

Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.

Keywords: DNA methylation; EWAS; QTL; allele specific; histone modification; immune; monocyte; neutrophil; t-cell; transription.

Figures

None
Figure 1
Figure 1
Study Design Overview of study design and molecular traits investigated. Details of sample collections are given in Figure S1 and Table S1.
Figure 2
Figure 2
Variance Decomposition and Epigenetic Association Analysis of Gene Expression (A) Mechanisms of genetic and epigenetic associations with gene expression. Considered are direct cis-acting genetic effects (light blue) as well as epigenetic correlations with gene expression that are independent of genetics (dark blue). No assumption is made on causal directionality for shared genetic effects (light blue, dashed line). (B) Proportion of transcriptome variance explained by genetic and epigenetic factors for individual genes, when considering putative cis-regulatory elements (within ±1 Mb of the gene body). Shown is the cumulative contribution of genes with increasing proportions of explained variance, considering genetic factors (blue), DNA methylation (orange), H3K4me1 (violet), and H3K27ac (pink) in monocytes. Epigenetic variance components were estimated either with (solid lines, G-corrected) or without (dashed lines, uncorrected) accounting for local cis-genetic variation (Methods). (C) Scatterplot of the proportion of variance explained by cis-genetics (x axis) versus cis-epigenetic (y axis) effects in monocytes. Significant variance components (VCs, FDR <5%) are coded in color. (D) Overlap of genes with significant cis-genetic and cis-epigenetic contributions to expression variance. (E) Overlap of genes with significant contributions from (cis) DNA methylation, (cis) H3K4me1, and (cis) H3K27ac. (F) Manhattan plot for gene TMEM176A obtained from the cis-epigenetic association analysis of gene expression in monocytes. Top panel: analysis without accounting for cis-genetic variation. Bottom panel: analysis when accounting for cis-genetic variation. (G) Fraction of genes with significant epigenetic associations (epiGenes, FDR <5%) before (uncorrected) and after correcting (G-corrected) for common cis-genetic variation. For T cells, a lower number of ChIP-seq data for H3K4me1 was due to lower initial immunoprecipitation enrichment for a subset of cryopreserved samples with insufficient material for repeat assays; hence only methylation was used in this analysis. See also Figures S5 and S6 and Tables S3 and S4.
Figure 3
Figure 3
Features, Cell-type Specificity, and Coordination of QTLs (A) Number of protein-coding and non-coding genes with significant eQTL (FDR <5%). (B) Number of phenotypes associated with the same QTL. (C) Percentage of phenotypes that are cell-type-specific (top) and genome-wide patterns of QTL sharing (π1 statistics) among cell types (bottom). (D) Correlation (Pearson) between effect sizes for QTLs shared between different cell types. (E) Percentage of eSNPs also associated (r2 ≥ 0.8) with H3K27ac and H3K4me1 (left) or methylation levels (right). (F) Correlation (Pearson) between effect size of expression and other molecular trait QTLs at overlapping signals (LD ≥0.8). (G) Fold-enrichment of eQTLs, hQTLs, and meQTLs in different chromatin segmentation states. See also Figures S2, S3, and S4 and Tables S2, S3, and S4.
Figure 4
Figure 4
Features, Cell-type Specificity, and Examples of Splicing QTLs (A) Number of protein-coding, non-coding gene, and unannotated events with a significant splicing QTL (FDR <5%). (B) Percentage of different alternative splicing events from PSI (top) and ISO (bottom) analyses. (C) Percentage of PSI and ISO events that are cell-type-specific (top), and genome-wide patterns of QTL sharing (π1 statistics) among the three cell types (bottom). (D) Probability distribution of lead eQTL and sQTL SNPs around genes. (E–G) Examples of alternatively spliced genes showing transcript structure and their distribution based on genotypes at each ISO sQTL. (E) IRF5 and rs3807306, a RA-predisposing SNP that is associated with the switch of two major isoforms that have alternative 5′ UTR in neutrophils. (F) BTNL8 gene structure and rs47007720, which switches a protein-coding major isoform to a non-coding isoform with intron retention in neutrophils. (G) GBP3 gene structure and rs10922542, which switches a protein-coding major isoform to a nonsense-mediated-decay isoform and involves an exon skipping event in T cells. See also Figure S3 and Tables S2, S3, and S4.
Figure 5
Figure 5
Features of Molecular Traits Revealed by Allelic Analyses (A) Relationship of significant allelic expression imbalance and mapped common cis-regulatory SNPs. Nearly 90% of transcripts show <1.5-fold difference between maternal and paternal copy (green line) with >2-fold differences seen in only ∼3% of transcripts. The primary (blue bar) or secondary (light blue) ASE mapped SNPs account for the majority of significant allelic effects, because homozygosity for these cis-rSNPs (red bars) is observed in only ∼7% cases with allelic imbalances >3-fold. (B) Coordinated genetic effects for genes and local chromatin peaks (lead SNPs r2 ≥ 0.8) are approximately four times more numerous (blue bars “Gene-peak +ve” AS+QTL) when both allelic and QTL mapping hits are considered as compared to QTL mapped hits alone (blue bars “Gene-peak +ve” QTL) and can be validated in up to 47% cases (green line) by intra-individual allelic correlation among genes and peaks. Genes with QTLs (QTL or AS+QTL) without coordinated genetic effects do not show (<5%) allelic correlation of local peaks. (C) Validated gene TSS/peak allelic coordination (arcs scaled by Pearson r2). Three (blue arc) H3K27ac and one H3K4me1 (red) elements linked allelicly to ARID5B, and similar allelic coordination for MTAP, while HOTAIRM1 is linked to multiple regulatory elements. For ARID5B and MTAP, the underlying SNPs (red [-log10] p value track “eQTL Pv”) overlap a coordinated peak as well as a GWAS variant (green NHGRI GWAS catalog SNPs on bottom) linked to rheumatoid arthritis and nevus counts, respectively. (D) Disease locus functional phenotype captured solely in allele-specific analyses. IL2RA SNP (rs12722489) is associated with multiple sclerosis and Crohn’s disease and is the top SNP for a H3K27ac CHT event spanning the transcript (blue bar); the top IL2RA CHT SNP is in high LD (r2 = 0.8) with the chromatin allelic signal. Allelic variations between gene and H3K27ac among individuals are extremely highly correlated (Pearson r2 > 0.95, blue arc), suggesting that allelic chromatin altered by disease SNP can lead to differential allelic expression of IL2RA. See also Figure S7 and Table S6.
Figure 6
Figure 6
Allelic Behavior of Locally Correlated eQTLs Examples of modes for clustering of “cis-eQTLs.” Top to bottom: gene annotations (blue), eQTL pair sharing same top association (blue and red rectangles), local RNA-signal (fwd and rev strand; black), H3K4me1 (red) and H3K27ac (blue), average (log) RNA-seq intensity among top SNP eQTL SNP homozygotes (AA, red; BB, green), top SNP (blue tick and rsID), eQTL mapping result (-log10 p value track in blue), allelic expression deviation (equal expression = 0, monoallelic expression = |0.5|) among top QTL SNP heterozygotes in forward (black) and reverse (gray) strands, allelic H3K27ac (blue), and H3Kme1 (red) deviation among top QTL SNP heterozygotes. (A) “head-to-head” configuration of eQTL and allelic effect, where total and allelic difference is mapped to a variant in a bidirectional promoter. (B) local SNP altering both chromatin and reverse and forward strands across multiple transcripts and chromatin signal. (C) example of a putative “cis-trans” pair where B3GALNT2 shows strong overexpression of one genotype and consistent allelic effect with eSNP localizing to its promoter, which also alters expression level of GGPS1 without detectable allelic effect. See also Figure S7 and Table S6.
Figure 7
Figure 7
Molecular Mechanisms at Autoimmune Disease Loci (A) Enrichment in molecular QTLs of celiac disease (CEL), Crohn’s disease (CD), inflammatory bowel disease (IBD), ulcerative colitis (UC), multiple sclerosis (MS), rheumatoid arthritis (RA), and type 1 (T1D) and type 2 diabetes (T2D). (B) N overlap = Number of observed QTL-trait pairs (top table) or unique disease loci (bottom table) that overlap (r2 ≥ 0.8) disease variants across all three cell types. Disease colocalized = number and proportion of overlapping pairs that colocalize with disease variants with PP3 ≥ 0.99. FE = Ratio of fold enrichment of these proportions over eQTLs. (C) Number (%) of disease loci colocalizing with cell-type-specific molecular QTLs, for associations unique to M, N, T, or shared between two or three cell types. (D–G) Examples of colocalization between disease and molecular traits. Each plot shows regional association (window 2 Mb centered on the significant peak) for a given disease locus (gray), molecular mark (color) and cell type, and corresponding molecular trait signal coverage (log2 RPM, 20 kb). (D) PSI sQTL. (E) eQTL/meQTL. (F) eQTL/hQTL. (G) hQTL with no corresponding eQTL. See Table S5.
Figure S1
Figure S1
Sample Collection, Related to Figure 1 (A) Morphological assessment of purified cell preparations. Cells were fixed to slides using a Cytospin and stained using Wright-Giemsa stain prior to photographing using 100x magnification. (B) Examples of neutrophil, monocyte and naive CD4+ T cell staining to assess purity of cell preparations. (C) Histogram of cell purity based on FACs analysis in three cell types. (D) Details of data production centers. Data from this project were produced in different institutes as detailed here: University of Cambridge- UCAM, European Bioinformatics Institute- EBI, Wellcome Trust Sanger Institute- WTSI, Nijmegen Centre for Molecular Life Sciences- NCMLS, University College London- UCL, McGill University- McGill, Max Planck Institute for Molecular Genetics- MPIMG. Peripheral blood mononuclear cells (PBMC) were isolated from donors at UCAM and from these Monocytes (M), Neutrophils (N), naive CD4+ T cells (T) were extracted, with a further aliquot used as a source of genomic (g)DNA samples. gDNA was shipped to the WTSI for sequencing, the monocyte/neutrophil samples were divided between MPIMG/UCL/WTSI+NCMLS for RNA-seq, DNA methylation sequencing (Methylation) and ChIP-seq respectively and the naive CD4+ T cells sent to McGill for RNA-seq/Methylation/ChIP-seq. In addition to this three samples from each institute/assay set were sent to the reciprocal institute for cross-center validation purposes (eg RNA-seq assays were carried out on the same three samples at both MPIMG and McGill etc.). Data processing/analysis was carried out at WTSI for WGS and RNA-seq, UCL for DNA methylation sequencing and EBI for ChIP-seq.
Figure S2
Figure S2
WGS and DNA Methylation Sample and Data Quality Metrics, Related to Figures 3 and 4 WGS (A-H) and DNA methylation (I-N) sample and data quality metrics. (A) Principal component analysis (PCA) scatterplot of the first two components using the resulting merged datasets (1000GP + Blueprint). The dashed line indicates the arbitrary threshold to discriminate the population of European ancestry. (B) Number of SNPs (x106) by non-reference allele frequency (AF) bins. (C) Number of INDELs (x104) by non-reference AF bins. (D) Size distribution of INDELs. Negative lengths represent deletions and positive lengths represent insertions. (E) Number of SNPs (x106) and INDELs (x104) by sample. (F) Depth of coverage by sample. (G) Ratio of heterozygous and homozygous non-reference SNP genotypes by sample and transition to transversion ratio (Ts/Tv) by sample. (H) Types of substitution in percentage. (I-K) Distributions of DNA methylation M-values for each cell type. Each line represents one sample. (L) Barplot representing the proportions of variance explained by the first ten principal components of a principal component analysis across all samples used in the study. (M) Visualization of the first two principal components of a principal component analysis across all samples used in the study. Each data point represents one sample, colored by cell type. (N) Multidimensional scaling of all samples used in the study, based on Euclidean distances. Each data point represents one sample, colored by cell type.
Figure S3
Figure S3
RNA-Seq Distribution and Batch Correction, Related to Figures 3 and 4 (A) PCA before and after batch correction using ComBat in gene level. Darker color lines and dots represent cross-over samples from different sequence center. Distribution of normalized read counts (log2) in gene level in monocytes, neutrophils and naive CD4+ T cells. (B) Scatterplots of the pairwise correlation of gene quantification between crossover samples before and after batch correction. (C) Distribution of before and after PEER corrected PSI values (upper panel) and PCA plots (lower panel) in monocytes, neutrophils and naive CD4+ T cell.
Figure S4
Figure S4
ChiP-Sequencing Data Quality Metrics, Related to Figure 3 (A–D) ChIP-seq quality control plots with consistent color convention throughout; Neutrophil (blue), Monocyte (green) and T cell (yellow). Plots are split by factor assayed, H3K4me1 (left) and H3K27ac (right). (A) Histogram displaying bins of quality control passed reads on x axis and percent of individuals falling into each bin on y axis. (B) Scatterplot displaying number of peaks called at FDR threshold per individual and colored by cell type is shown on x axis. On the y axis fraction of reads intersecting a consensus peak set of regions shared across all three cell types. (C) Histogram displaying bins of normalized strand coefficient on x axis, y axis percent of individual which fall into each bin. (D) Histogram displaying bins of relative strand coefficient on x axis, y axis percent of individual which fall into each bin. (E and F) Scatterplot showing the Pearson correlation r between replicates of same donors processed at NCMLS and McGill (F) Hierarchical clustering for each histone modification marker using Pearson correlation as distance metrics and standardized log2 RPM (Reads Per Million) in chromosome 1 only. Similar clustering is likewise seen for all chromosomes. (G–J) PEER corrected matrices of log2 RPM. Density of log2 RPM values for H3K27ac in (G) and for H3Kme1 in (I). Scatterplot colored by the ChIP center of the first two orthogonal components from PCA for H3K27ac in (H) and for H3Kme1 in (J).
Figure S5
Figure S5
Variance Decomposition Analyses, Related to Figure 2 (A and B) Figures showing analogous results as those presented in Figure 2B; however, for neutrophils and naive CD4+ T cells. (C–F) Variance partitioning results obtained from the joint model across all four molecular layers in monocytes. Shown are the distributions of variance explained by genetics, cumulative epigenetics as well as separately for individual epigenetic layers for different sets of genes. Specifically, genes were stratified by the median of (C) the total variance explained by the joint model (“low” and “high” indicate genes below and above the median), (D) the median gene-expression level, (E) gene type and (F) the variance of the log of the expression levels. (G and H) Figures showing analogous results as those presented in Figures 2C–2E; however for neutrophils and naive CD4+ T cells. (I) Pairwise correlation of the variance explained by different molecular layers between monocytes and neutrophils. Epigenetic contributions were estimated using a model that accounts for underlying genetic variation (see the STAR Methods). The Spearman’s rank correlation (ρ) is also reported. Venn Diagrams show the overlap of genes with significant genetic, methylation, H3K4me1 and H3K27ac contributions between monocytes and neutrophils (FDR < 5%, using a variance component test, see the STAR Methods). (J) Comparison of variance component estimates for individual molecular layers either considering a model that accounts for expression heterogeneity (EH, y axis) or a model that does not account for EH (no EH, x axis) in monocytes (see the STAR Methods). The genetic variance estimates were consistent across both approaches, whereas epigenetic variance estimates were substantially increased when not using the additional EH adjustment. (K) Comparison of the proportion of variance explained by different molecular layers across cell types when either considering a 100Kb or a 1Mb cis window (see the STAR Methods).
Figure S6
Figure S6
EWAS, Related to Figure 2 (A) Scatterplot of the gene-level P values (see the STAR Methods) obtained from the EWAS analysis either accounting (y axis) or not (x axis) for genetic effects in all three cell types. Genes with significant cis-epigenetic association only when not accounting for underlying genetic effects (“Only without accounting,” FDR < 5%) are indicated in dark blue. Genes with significant cis-epigenetic association only when accounting for underlying genetic effects (“Only accounting”) are indicated in green. Finally, genes with significant cis-epigenetic associations both when accounting or not for underlying genetic effects (“Both”) are indicated in blue. (B) Manhattan plot for the gene MSR1 (ENSG00000038945), illustrating a cis epigenetic association that is robust to correction of genetic effects. Shown are -log10(pv) from an EWAS analysis either without accounting for cis genetic effects (top panel) or when accounting for cis genetic variation (bottom panel).
Figure S7
Figure S7
Distribution of Primary ASE Associations, Related to Figures 5 and 6 (A) Distribution of primary associations with respect to measured transcript for ASE (Blue), CHT (Red), or secondary, conditional ASE (Green) associations. The relative density of associations is adjusted to tested common SNPs in different bins. (B) Enrichment of chromHMM chromatin states for top primary ASE (Blue), primary CHT (Orange), or secondary ASE (gray) associations. The y axis is the fold-enrichment of SNPs in E1-E11 chromHMM states relative to all SNPs tested for association. (C) Enrichment of chromHMM chromatin states for primary ASE associations (Blue), top primary associations overlapping from ASE and QTL tests (Orange), and from QTL tests (gray). The y axis is the fold-enrichment of SNPs in E1-E11 chromHMM states relative to all SNPs tested for association. (D) Proportion of associations versus tested traits. For each type of test (ASE/CHE/ASES) and assay (Gene, H3K27ac, H3K4me1), the proportion of features with a QTL/ASE/ASH at 5% FDR relative to the total number of features tested is shown as bar graph for each cell type alone (Blue shade), common to two cell types (Red shade), and common all three cell types (green). The actual number of features at 5% FDR is shown above each bar.

Comment in

Similar articles

See all similar articles

Cited by 106 articles

See all "Cited by" articles

References

    1. Abnizova I., Skelly T., Naumenko F., Whiteford N., Brown C., Cox T. Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing. J. Bioinform. Comput. Biol. 2010;8:579–591. - PubMed
    1. Adoue V., Schiavi A., Light N., Almlöf J.C., Lundmark P., Ge B., Kwan T., Caron M., Rönnblom L., Wang C. Allelic expression mapping across cellular lineages to establish impact of non-coding SNPs. Mol. Syst. Biol. 2014;10:754. - PMC - PubMed
    1. Aldridge S., Watt S., Quail M.A., Rayner T., Lukk M., Bimson M.F., Gaffney D., Odom D.T. AHT-ChIP-seq: a completely automated robotic protocol for high-throughput chromatin immunoprecipitation. Genome Biol. 2013;14:R124. - PMC - PubMed
    1. Allum F., Shao X., Guénard F., Simon M.M., Busche S., Caron M., Lambourne J., Lessard J., Tandre K., Hedman A.K., Multiple Tissue Human Expression Resource Consortium Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants. Nat. Commun. 2015;6:7211. - PMC - PubMed
    1. Anderson C.A., Boucher G., Lees C.W., Franke A., D’Amato M., Taylor K.D., Lee J.C., Goyette P., Imielinski M., Latiano A. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat. Genet. 2011;43:246–252. - PMC - PubMed

Publication types

LinkOut - more resources

Feedback