Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 17 (8), 2042-2059

A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome

Affiliations

A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome

Anthony D Schmitt et al. Cell Rep.

Abstract

The three-dimensional configuration of DNA is integral to all nuclear processes in eukaryotes, yet our knowledge of the chromosome architecture is still limited. Genome-wide chromosome conformation capture studies have uncovered features of chromatin organization in cultured cells, but genome architecture in human tissues has yet to be explored. Here, we report the most comprehensive survey to date of chromatin organization in human tissues. Through integrative analysis of chromatin contact maps in 21 primary human tissues and cell types, we find topologically associating domains highly conserved in different tissues. We also discover genomic regions that exhibit unusually high levels of local chromatin interactions. These frequently interacting regions (FIREs) are enriched for super-enhancers and are near tissue-specifically expressed genes. They display strong tissue-specificity in local chromatin interactions. Additionally, FIRE formation is partially dependent on CTCF and the Cohesin complex. We further show that FIREs can help annotate the function of non-coding sequence variants.

Figures

Figure 1
Figure 1. Global features of 3D genome organization in 7 cell lines and 14 adult tissues
a) Illustration of the primary 21 Hi-C datasets analyzed, depicting the cell (left panel) or tissue (right panel) origin of the samples, as well as the germ layer origin for tissues (right panel). Hi-C interaction patterns across an 11.68Mb region (chr12:82,840,000-94,520,000) are shown for all 7 cell lines and 14 tissues at 40kb bin resolution. b) Genome browser snapshot showing compartment A/B patterns (PC1 value) across chromosome 2 in 21 samples, with 7 cell lines at the top and 14 primary adult tissues on the bottom. Compartment A/B patterns are at 1Mb bin resolution. Positive PC1 in blue corresponds to A compartment, negative PC2 in yellow corresponds to B compartment. c) Bar plots showing the degree of conservation of A/B compartment labels 21 human cell lines and adult tissues. The Y-axis is the fraction of the genome conserved by the 22 possible combinations of compartment A/B designations. The label below each bar represents the composition of compartment designations. For example, ‘16A/5B’ represents the genomic region where 16 samples exhibit a compartment A label and the other 5 samples exhibit compartment B label. d) Genome browser snapshot showing topological domain boundaries across chromosome 7 in 21 samples, with 7 cell lines at the top and 14 primary adult tissues on the bottom. Boundaries are identified at 40kb bin resolution. e) Bar plots showing the degree of topological domain boundary conservation across 21 human cell lines and tissues. For each putative boundary region, we tallied how many cell lines have a boundary within that region (see Supplemental Methods). Shown here is a total fraction of TAD boundary regions, whereby the Y-axis is the fraction of TAD boundaries conserved at least a certain number of samples, as categorized along the x-axis.
Figure 2
Figure 2. Identification and positional enrichment of frequently interacting regions
a) Illustrative examples showing the FIRE score methodology. Hi-C contact maps from a 6.68Mb region (chr19:40,480,000-47,160,000) are shown for GM12878 and IMR90 cells at 40kb bin resolution (top). To the right of the contact maps are line plots showing the fully processed FIRE score for each 40kb bin. A red line is drawn at the significance cutoff. The second row of contact maps illustrates FIRE scores in a sub-matrix (chr19:41,560,000-43,200,000) of the above contact maps (black box). Line plots directly below show the intermediate stage in the FIRE score calculation, which is the output from HiCNormCis (see Supplemental Methods). Genome-wide HiCNormCis normalized counts are then z-score transformed and converted to a –ln(p-value) scale to obtain the final FIRE score (bottom line plots). Dashed columns highlight two 40kb bins, one showing a FIRE peak in GM12878 cells but not in IMR90 cells, and the other showing low FIRE score in both cell types. b) Chromosome ideograms showing the genome-wide positional distribution of FIRE bins in GM12878 (blue, n=4,769) and IMR90 (maroon, n=4,729). Genome-wide visualization captures both conserved and specific FIRE bins. Only autosomes are depicted. c) Genome browser snapshot of compartment A/B patterns in 21 samples across chromosome 6 (top), and a genome browser snapshot of a 90Mb subset of chromosome 6 (chr6:25,000,000-115,000,000) showing compartment A/B patterns for 21 samples (top set, blue/yellow) and FIRE calls (bottom set, maroon). d) Barplots showing an enrichment analysis of FIRE positioning within either compartment A or B, illustrating FIREs are enriched in compartment A and depleted in compartment B compared to random permutation of FIRE bin location within each sample (* p < 5.0e-7; ** p < 7.0e-13; # p < 2.2e-16; chi-square test). Statistical tests correspond to the significance of FIRE enrichment in compartment A. e) Line plot showing an example of IMR90 FIRE bin positioning relative to TADs (see Supplemental Methods). The red lines depicts the observed counts (y-axis) of actual IMR90 FIRE bins, while gray dashed line shows the counts of permuted FIRE bin locations. The x-axis ranges from 0 to 0.5, where 0 represents TAD boundaries, and 0.5 represents TAD center points. f) Heatmap showing the TAD position enrichment analysis across all 21 samples. Shown are the log2(observed/expected) values for each distance increment, as computed in Panel e.
Figure 3
Figure 3. FIREs are tissue-type specific and enriched near genes involved in tissue function
a) At the top, a dendrogram resulting from hierarchical clustering analysis using genome-wide FIRE scores for each sample. The y-axis is the Euclidean distance between FIRE scores from any two samples. The heatmap below shows a subset of FIRE bins (n=8,371), corresponding to FIRE bins that are called as FIRE in only one or two samples. For ventricle tissues, brain tissues, IMR90/MSC, and H1/MES, FIREs specific to two samples are allowed in the definition of sample-specific. b) Genome browser snapshot showing a GM12878-specific FIRE region (chr19:6,560,000-6,640,000) (top, maroon) in an 800kb region around CD70 (chr19:6,583,193-6,604,114). Below is a line plot of FIRE scores for each sample, showing the GM12878-specific FIRE peak (blue). c) Genome browser snapshot showing a brain-specific FIRE region (chr3:78,920,000-78,960,000), shared by CO and HC, in a 760kb region within the ROBO1 gene (chr3:78,646,338-79,068,609). Below is a line plot of FIRE scores for each tissue, showing CO (yellow) and HC (pea green) FIRE peaks. d) GREAT biological process analysis of genes surrounding GM12878-specific FIRE bins (n=1,464 bins) showing biological processes highly related to immune functions. Plotted values are the – log10 of the Bonferroni-corrected binomial p values. e) Same as Panel e, except using genes surrounding brain (CO and HC) specific FIRE bins (n=912 FIRE bins) showing several significant processes highly related to brain functionality. Plotted values are the –log10 of the Bonferroni-corrected binomial p values.
Figure 4
Figure 4. FIREs are enriched for active enhancers and positioned near sample-specific gene expression
a) Normalized Hi-C contact matrix in Left Ventricle tissue showing a 2.76Mb locus (chr2:40,000,000-42,760,000). Below are genome browser tracks for previously published (Hnisz et al, 2013) LV super-enhancers (red), LV FIRE bins (brown), and UCSC genes, including isoforms (blue). To the right is the continuous LV FIRE score along this locus. b) Heatmaps showing the local enrichment (see Supplemental Methods) of H3K27me3 (left), H3K4me1 (middle), and H3K27ac (right), centered on FIRE bins for each cell line or adult tissue. H3K27me3 data was not available for CO or HC. c) Bar plot showing the observed overlap between actual FIRE bins and previously characterized typical enhancers (blue) (Hnisz et al., 2013) for each available cell line or tissue that has both Hi-C data and typical enhancer calls. Expected values are also shown (green), which are calculated by permuting the location of FIRE bins within each tissue and calculating the overlap with typical enhancers. The y-axis shows the percentage of typical enhancers overlapped by FIREs. d) Same as Panel c, except showing the percentage of super-enhancers overlapped by FIRE bins for each testable cell line or tissue. e) Genome browser snapshot showing an example of sample-specific gene expression near sample-specific FIREs. Shown here is a 780kb locus (chr16:9,820,000-10,600,000) around GRIN2A (chr16:9,852,375-10,276,611). At the top, FIRE tracks (maroon) for each sample, showing the Brain-specific FIRE (chr16:10,040,000-10,080,000, highlighted in yellow) ∼197kb away from GRIN2A TSS. Below, RNA-Seq data (Roadmap Epigenomics Consortium et al, 2015) for all samples except OV (blue) showing GRIN2A is mainly expressed in brain tissues. f) Bar plot indicating the relative gene expression (see Supplemental Methods) of GRIN2A across 20 samples. g) All-by-all mean-rank enrichment analysis result showing gene expression specificity of genes within 200kb of sample-specific FIRE bins (see Supplemental Methods). Each row is a different sample type for which the sample-specific FIRE gene set is collected, and columns are the sample type used to calculate the relative expression rank of each gene. IMR90/MSC, M1/MES, and brain tissues were previously shown to have highly overlapped FIRE bins (Figure 3a) and are therefore grouped. Color for each row of the heatmap indicates the enrichment. Outlined in thick black boxes along the diagonal are the matrix entries for which the sample for the sample-specific FIRE gene set and expression rank list are the same. Highlighted in a thin yellow box is the analysis portrayed in Panel h. h) Line plot illustrating a single mean-rank enrichment analysis. The plot shows the relative gene expression values (y-axis) in Cortex as a function of their numeric ranking (x-axis) in Cortex. Vertical dashed lines show the position of the observed mean-rank of Cortex-specific FIRE genes (red dash), and the expected mean-rank based on size-matched randomly selected non-FIRE bins in Cortex (gray dash). Inset is the calculation of enrichment score.
Figure 5
Figure 5. FIREs are conserved across evolution, and mediated by Cohesin
a) Venn diagrams showing the significant number of conserved FIRE bins when lifting over mouse FIREs onto the human genome (left column), or lifting over human FIREs onto the mouse genome (right column) in either embryonic stem cells (top row, p value < 5.0e-16), neural progenitor cells (middle row, p value < 2.2e-16), and cortex tissue (bottom row, p value < 2.2e-16). Significance evaluated using a Fisher's exact test (see Supplemental Methods). b) Normalized Hi-C contact matrix in human cortex (left) and mouse cortex (right) for a 2Mb syntenic region (human chr3:78,000,000-80,000,000; mouse chr16:71,520,000-73,520,000) showing a conserved FIRE (connected black lines) within the same tissue type but across species. Below is a UCSC gene track, and to right of the contact matrix is the continuous FIRE score across the locus. For the human data, the Hi-C contact matrix, gene track, and FIRE score plot have been inverted to show synteny with the mouse data. c) Normalized Hi-C contact matrices (red and white) or delta matrix (green and blue) for 1.96Mb locus (chr1:55,400,000-57,360,000) illustrating the change of interaction frequency between TEV and HRV. Directly below the delta matrix are binding profiles of CTCF and the Cohesin subunit SMC3 in wild type HEK cells (Zuin et al, 2014), as well as TAD boundary annotations. To the right of the Hi-C delta matrices shows the continuous FIRE Z-score difference between TEV and HRV. Below is a delta matrix at a zoomed in 800kb region (chr1:55,560,000-56,360,000) for TEV-HRV showing the greatest reduction of FIRE score occurs at the bin with co-binding of CTCF and SMC3. FIRE Z-score difference is plotted to the right of the subtraction matrices. d) Box plots showing the change in Z-score at FIREs overlapping bins bound by CTCF but not SMC3 “CTCF-only” (left plot), all CTCF peaks (middle plot), and CTCF and SMC3 co-binding (right plot) for the comparison of TEV and HRV. The red boxes show distributions of FIRE score change at FIRE bins called in wild type cells minus the mutant cells, while the blue boxes are distributions for FIRE score change at FIRE bins called in wild type cells but between biological replicates of wild type cells. These comparisons show the significant reduction of FIRE score at all CTCF peaks, and especially at CTCF SMC3 co-bound peaks overlapping FIRE bins (*p=1.0e-4, **p=4.04e-5; two sample t-test). e) Similar to Panel d, except analysis of Z-score change were done considering FIREs overlapping the Cohesin subunit Rad21 peaks using previously published Hi-C data and Rad21 ChIP-seq data in mouse neural stem cells (left plot) and mouse post-mitotic astrocytes (middle plot) (Sofueva et al., 2013). Comparison of Z-score change upon deletion of Rad21 shows significant decrease compared to changes observed between biological replicates (*p<0.01; **p< 2.2e-16; two sample t-test). f) Similar to Panel e, except analysis of Z-score change was conducted on previously published Hi-C data and Rad21 ChIP-seq data in mouse thymocytes (Seitan et al., 2013). Comparing the distributions of Z-score changes at FIRE bins bound by Rad21 shows a significant reduction in Z-score between the wild type and Rad21 knockout cells compared to changes between wild type biological replicates (**p< 2.2e-16; two sample t-test).
Figure 6
Figure 6. FIREs are enriched with disease-associated GWAS SNPs
a) Heatmap showing the enrichment of disease-associated GWAS SNPs (see Supplemental Methods) in FIRE bins for each cell line or tissue (columns). Rows represent the enrichment of disease-associated SNPs for one disease, and all rows in the presented heatmap are sorted from high to low based on enrichment score in GM12878 (lymphoblast cell line). Only diseases with >15 SNPs are shown. Noted to the right are the top 15 diseases for which disease-associated SNPs are most enriched in GM12878 FIREs, showing the high enrichment of several diseases (all except mean corpuscular volume) with previously noted immune-mediated pathology (Jostins et al., 2012). b) Normalized Hi-C contact matrix of a 2.16Mb locus (chr1:65,120,000-67,280,000) in GM12878 cells. The tracks below depicts the presence of two SNPs associated with acute lymphoblastic leukemia (rs546784 and rs6683977) located within a FIRE bin (brown, chr1:66,760,000-66,800,000), and ∼30kb outside of a GM12878-specific super-enhancer (red), and also within PDE4B gene sequence. To the right of the Hi-C contact matrix is the FIRE score. c) Bar plots showing the enrichment of Parkinson's disease-associated SNPs across 14 primary adult tissue FIRE annotations, also highlight the highest enrichment in FIREs from both brain tissues (CO and HC). d) Bar plots showing the enrichment of SNPs associated with the quantitative triglycerides trait across 14 primary adult tissue FIRE annotations, also highlighting the highest enrichment in liver FIREs. e) Normalized Hi-C contact matrix (top) in GM12878 for a 4.04Mb locus (chr7:48,440,000-52,480,000) centered on the IKZF1 gene (red text). Hi-C color scale ranges from the 15th to 99th percentile normalized contact frequencies within this locus. The reflected matrix shows the statistically significant (FDR<1e-6) bin-pairs within 2Mb genomic distance across the locus. Only bin-pairs with FDR<1e-6 are yellow, the rest are black. Between the matrices are a UCSC gene annotations (blue, top), RNA-seq data (red), H3K27Ac data (black), typical enhancer annotations (Hnisz et al., 2013) (purple), FIRE annotations (brown), TAD boundary calls (blue) and a SNP that is statistically linked to the IKZF1 TSS (green). The blue lines outline the 440kb locus (chr7:50,240,000-50,680,000) that is shown in Panel f. f) Same as Panel e, except a zoomed in snapshot of 440kb locus (chr7:50,240,000-50,680,000) centered on a SNP-bearing FIRE bin (chr7:50,440,000-50,480,000) containing the 3′ UTR of IKZF1 and the SNP rs6964969. The blue box outlines the bin-pair that is the significant interaction between previously known SNP-gene pairs. g) Bar plots showing the enrichment of Liver GTEx eQTLs in FIRE peak bin-pairs, as a function of the subset of top Liver FIRE peaks (based on lowest False Discovery Rate) determined by Fit-Hi-C. h) Same as Panel g, expect using Aorta GTEx eQTLs, FIREs and FIRE peaks. i) Same as Panel g, expect using Left Ventricle GTEx eQTLs, FIREs and FIRE peaks. j) Same as Panel g, expect using Cortex GTEx eQTLs, FIREs and FIRE peaks.
Figure 7
Figure 7. FIREs have several targets and are self-interactive
a) Heatmap showing the relationship between the mean observed contact frequencies at FIREs compared to the mean observed contact frequency at non-FIREs. Enrichment is shown as the ratio between the two contact observed mean contact frequencies (FIRE:non-FIRE) per unit genomic distance, from +/- 40kb to +/- 2Mb, centered on FIRE bins. Each row represents the analysis of a different sample, and the color intensity corresponds to the enrichment value. b) Box plot for GM12878 showing the distributions of number of statistically significant (FDR<1e-6) Hi-C contacts within 200kb emanating from non-FIRE (blue box) or FIRE (yellow box) bins (two-sample t test p-value < 2.2e-16). c) Same as Panel b, except analysis of Liver data. d) Comparison of the normalized contact matrix (top triangle) to statistically confident (FDR<1e-6) pairwise contacts (bottom triangle) in GM12878 across a 440kb locus centered on BLC11A. Between the matrices is a UCSC gene annotations (blue), RNA-seq (red), H3K27Ac (black), typical enhancer annotations (purple) (Hnisz et al., 2013), and FIRE annotations (brown). Color bar values of the Hi-C contact matrix correspond to the 15th and 99th percentiles, respectively, across this locus. In the lower triangle matrix, only the most confident bin-pairs (FDR<1e-6) are colored yellow. e) Line plots in GM12878 showing the normalized Hi-C contact frequency (y-axis) as a function of genomic distance (x-axis) for 3 categories of pairwise interactions: FIRE-FIRE interactions (red line), FIRE-nonFIRE interactions (pink line), and nonFIRE-nonFIRE interactions (gray line). f) Same as Panel e, except analysis in Bladder tissue. g) Venn diagram showing the overlap between all annotated FIRE bins (red circle) in GM12878 and all bins that are involved in statistically significant (FDR<1e-6) pairwise contacts (blue circle). h) Same as panel g, except analysis in Liver tissue.

Similar articles

See all similar articles

Cited by 139 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback