Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 114 (9), 2301-2306

Genetic Regulatory Signatures Underlying Islet Gene Expression and Type 2 Diabetes

Collaborators, Affiliations

Genetic Regulatory Signatures Underlying Islet Gene Expression and Type 2 Diabetes

Arushi Varshney et al. Proc Natl Acad Sci U S A.

Abstract

Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits. However, the pathogenic mechanisms of most of these SNPs remain elusive. Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D. We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps. Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers. High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATAC-seq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL. Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions. Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints. Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions. Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.

Keywords: chromatin; diabetes; eQTL; epigenome; footprint.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Integrated genomic, epigenomic, and transcriptomic analyses of human pancreatic islets. (A) An overview of diverse molecular profiling data types used in this study. Integrative molecular profiling (open chromatin, ATAC-seq; chromatin states; RNA-seq) highlights islet-specific signatures at the KCNK17 locus. (B) Plot of strength of association (y axis) for significant islet cis-eQTLs colored by chromatin-state annotation (A) by chromosomal location (x axis); diamonds indicate SNPs overlapping ATAC-seq footprints. An interactive version of this plot can be found at theparkerlab.org/tools/isleteqtl/. (C) Plot of strength of islet cis-eQTL association for T2D and related trait GWAS SNPs after conditional analysis to identify variants likely independent of stronger cis-eQTL signals for the same gene by chromosomal position and annotated as in B. The plot includes all GWAS SNP–gene pairs with FDR < 0.05 in original cis-eQTL analysis. The dotted red line represents the P value threshold for FDR < 0.05 based on the conditional analysis. (D) Islet cis-eQTL associated with KCNK17 expression highlighted for comparison with molecular profiling tracks in A. (E) Plot of normalized KCNK17 expression in islet samples and cis-eQTL risk allele dosage. (F) Functional validation of KCNK17 cis-eQTL at its promoter region. The haplotype containing alleles associated with T2D risk and increased KCNK17 expression (rs10947804-C, rs12663159-A, rs146060240-G, and rs34247110-A) shows higher transcriptional activity than the haplotype with nonrisk alleles. The cloned region is indicated at the top of A. Relative luciferase activity is given as mean ± SD of four to five independent clones per haplotype normalized to empty vector. Significance was evaluated using a two-sided t test.
Fig. S1.
Fig. S1.
Thirteen-chromatin-state model built from histone modification ChIP-seq data generated using ChromHMM (9) for 33 cell types (Table S2). (A) Each graph represents the overlap enrichment for 18 cell types of each of our 13 generated chromatin states with the Roadmap Epigenomics (8) reported states. (B) Renaming of generated 13 states (Original State) according to Roadmap Epigenomics overlap enrichments (New State) in A. (C) State numbers, histone mark emission probabilities, state names, and percentage genomic coverage of each chromatin state in human islets.
Fig. S2.
Fig. S2.
(A) LocusZoom plot showing that a T2D GWAS SNP (rs1535500/chr6:39284050, hg19, purple; other variants in LD colored according to r2) is not associated with KCNK16 expression in islets. (B) Plot for normalized KCNK16 expression and rs1535500 risk allele dosage from mRNA-seq and genotyping data in islet samples.
Fig. S3.
Fig. S3.
Fold enrichment of islet eQTLs in chromatin states across cells/tissues.
Fig. S4.
Fig. S4.
iESI. (A) Heat map showing mean FPKM for genes expressed in different tissues when binned by iESI quintiles. (B) Scatterplots showing FPKM for genes expressed in different tissues vs. the iESI. (C) Distribution of iESI by quintile of expression.
Fig. S5.
Fig. S5.
Enrichment of islet cis-eQTLs binned into quintiles by target gene iESI in islet active TSS and stretch enhancer chromatin states (red) and consensus islet intersect ATAC-seq peaks (present in both islet samples) in these states (blue). *P < 0.05 from GREGOR analysis.
Fig. S6.
Fig. S6.
Common and islet-specific gene eQTLs are enriched in different chromatin states. (A) LocusZoom plot of an islet cis-eQTL in the KCNA6 locus. (B) The cis-eQTL for KCNA6, which is in the top quintile of the iESI (iESI 5), overlaps an islet-specific enhancer state. (C) Active enhancer clustering (y axis) across cell types (x axis) reveals cell-specific enhancer regions. Cluster 13 is islet-specific. (D) Degree of overlap of enhancer clusters with stretch enhancers from four cell types. Islet stretch enhancers show the strongest overlap with islet-specific enhancer cluster 13, whereas GM12878 stretch enhancers show the strongest overlap with GM12878-specific enhancer cluster 1. The Jaccard statistic was normalized per column, so that values range from zero (no overlap) to one (maximum observed overlap). (E) Enrichment of islet eQTLs across enhancer clusters reveals that the full set of eQTLs (column 1) is enriched across multiple enhancer clusters, whereas eQTLs for islet-specific genes (iESI quintile 5; column 5) are enriched in the islet-specific enhancer cluster 13. Gray bars indicate nonsignificant after Bonferroni correction.
Fig. 2.
Fig. 2.
Nucleotide resolution islet ATAC-seq profiling nominates regulatory mechanisms. (A) RFX6 locus with expression (RNA-seq), chromatin states, open chromatin (ATAC-seq), and footprints for CTCF and RFX in islets. (B) Density plots indicating normalized sequence coverage of ATAC-seq from two human islet samples at sites overlapping CTCF (motif = CTCF_known2) and RFX (motif = RFX2_4) motifs. (C) Log twofold enrichment of islet cis-eQTLs in TF footprint motifs compared with their enrichment in TF nonfootprint motifs. TFs for which footprint and nonfootprint motifs overlap four or more eQTL SNPs are shown. Blue shows significant enrichment in footprints only (Bonferroni corrected P < 0.05). No significant enrichment was observed in any TF nonfootprint motif. (D) Reconstruction of CTCF (motif = CTCF_known2) and RFX (motif = RFX2_4) motifs using ATAC-seq TF footprint allelic bias data. Row 1: original motif PWM. Row 2: PWM genetically reconstructed using the overrepresented alleles (and extent of overrepresentation) for SNPs with significant ATAC-seq allelic bias. Row 3: count of nucleotides in SNPs with significant allelic bias. Row 4: PWM reconstructed using the count of nucleotides for heterozygous SNPs in the TF footprint. Row 5: count of nucleotides in heterozygous SNPs in the TF footprint.
Fig. S7.
Fig. S7.
Enrichment of islet, muscle, GM12878, and adipose ATAC-seq peaks (columns) in chromatin states across diverse tissues (y axis). Consensus (islet intersection) and individual (islets 1 and 2) islet ATAC-seq peaks show enrichment for active chromatin states in islets, which is more pronounced at TSS-distal (>5 kb from TSS) regions. Muscle (column 4), GM12878 (column 5), and adipose (column 6) ATAC-seq peak calls show similar trends with chromatin states from matched tissues. Note that TSS-distal ATAC-seq peaks from the islet intersect dataset overlap islet active enhancers more than any other chromatin state in islets. Note also that the level of islet enhancer overlap is larger than enhancer overlap in any other tissue.
Fig. S8.
Fig. S8.
Enrichment of islet cis-eQTLs (5% FDR) in ATAC-seq TF footprints that are only detected using phased SNP-aware scans (Materials and Methods). *P < 0.05 from GREGOR analysis.
Fig. S9.
Fig. S9.
SNPs that show allelic bias in ATAC-seq data (ab; blue box plot) exhibit larger effects on the predicted TF binding site motifs compared with randomly sampled 1000G SNPs (1000G; red box plot) overlapping the same footprint in islet 1. The y axis shows absolute value of the delta score [delta = −log10(FIMO P value of alternate sequence) − (−log10(FIMO P value of reference sequence))]. P values of the comparisons were determined by the Wilcoxon rank sum of test. (A) Footprints motif = RFX2_4. (B) Footprint motif = CTCF_known2.
Fig. 3.
Fig. 3.
T2D GWAS enrichment at islet footprints reveals confluent RFX motif disruption. (A) T2D GWAS SNPs are significantly enriched in RFX motifs in islet footprints but not in control motifs or footprints from a nondisease-relevant cell type (GM12878). TF motifs for which footprints overlap four or more T2D GWAS SNPs are shown. The red line indicates Bonferroni multiple testing threshold. (B) T2D-associated SNPs that overlap high information content (>1 bit) positions in RFX motifs. The highest scoring RFX footprints are reported for each T2D GWAS SNP. Act. Enh., active enhancer; Act. TSS, active TSS; Wk. Transc., weak transcribed. *Chromatin-state annotation overlapping the SNP. Because RFX motifs in C are organized by alignment to the longest RFX3_1 motif, motifs overlapping rs10947804 and rs1716165 correspond to the reverse complement sequence. Therefore, risk and nonrisk alleles are also reported as reverse complement relative to the plus strand sequence. (C) Alignment of highest scoring RFX footprint at each SNP; the boxes indicate the SNP overlap positions. Note that, in every case, the risk allele disrupts that motif.
Fig. S10.
Fig. S10.
Enrichment for T2D GWAS SNPs in regions flanking merged RFX footprint (red) and nonfootprint (blue) motifs.
Fig. S11.
Fig. S11.
RFX gene expression (FPKM) across islets and 16 Illumina Body map 2.0 tissues. The iESI quintile for each RFX gene is labeled in the islet columns. RFX6 has the highest iESI (0.94) among all RFX TF genes.

Similar articles

See all similar articles

Cited by 47 PubMed Central articles

See all "Cited by" articles

Publication types

MeSH terms

Feedback