Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2014 Jul 31;158(3):673-88.
doi: 10.1016/j.cell.2014.06.027.

H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency

Affiliations
Free PMC article
Meta-Analysis

H3K4me3 Breadth Is Linked to Cell Identity and Transcriptional Consistency

Bérénice A Benayoun et al. Cell. .
Free PMC article

Erratum in

  • Cell. 2015 Nov 19;163(5):1281-6

Abstract

Trimethylation of histone H3 at lysine 4 (H3K4me3) is a chromatin modification known to mark the transcription start sites of active genes. Here, we show that H3K4me3 domains that spread more broadly over genes in a given cell type preferentially mark genes that are essential for the identity and function of that cell type. Using the broadest H3K4me3 domains as a discovery tool in neural progenitor cells, we identify novel regulators of these cells. Machine learning models reveal that the broadest H3K4me3 domains represent a distinct entity, characterized by increased marks of elongation. The broadest H3K4me3 domains also have more paused polymerase at their promoters, suggesting a unique transcriptional output. Indeed, genes marked by the broadest H3K4me3 domains exhibit enhanced transcriptional consistency and [corrected] increased transcriptional levels, and perturbation of H3K4me3 breadth leads to changes in transcriptional consistency. Thus, H3K4me3 breadth contains information that could ensure transcriptional precision at key cell identity/function genes.

Figures

Figure 1
Figure 1. H3K4me3 breadth is an evolutionarily conserved feature that is not predictive of expression levels
A–B) Breadth distributions of H3K4me3 ChIP-seq peaks in H1 hESCs (A) and C2C12-derived myotubes (B) display ‘heavy right tails’, indicative of broader H3K4me3 domains than expected. Inserts: Example H3K4me3 regions in H1 hESCs or C2C12 myotubes. Black bar: ChIP-seq peaks called by MACS2. C) H3K4me3 ChIP signal sorted by breadth at −5kb, +5kb around transcription start sites (TSSs). D–E) mRNA levels is not a function of H3K4me3 breadth quantile at the population (left panels) or single cell (right panels) level by RNA-seq in H1 hESCs (D) or C2C12 myoblasts (E). Insert: Pearson correlation coefficients. See also Figure S1J.
Figure 2
Figure 2. H3K4me3 breadth enriches for genes that are important for cell identity and function
A) The top 5% broadest H3K4me3 domains enrich for stem cell regulators in H1 hESCs. Enrichment expressed as −log10(p-value) in Fisher’s exact test. Red dashed line: p = 0.05. See also Figure S2C. B) The top 5% broadest H3K4me3 domains enriches for genes involved in cell/tissue function. Significance as scaled −log10(p-value) in Fisher’s exact test (see Extended Experimental Procedures and Table S2). The NPC dataset is described in Figure 3A–3C. C) Hierarchical clustering of the top 5% broadest H3K4me3 domains from human tissues and cells based on Jaccard Index similarity. See also Figure S2D. D) Measure of cluster tightness (Silhouette index) from different sets of H3K4me3 domains in human tissues. See also Figure S2E–S2G. E) Quantile by quantile (Q-Q) plot of the quantile ranks of H3K4me3 domains marking known cell identity or reprogramming genes in tissue of relevance (Table S3). Significance in Kolmogorov-Smirnov test. F) H3K4me3 breadth is remodeled at a subset of loci during differentiation. Left panel: Scatterplots of H3K4me3 breadth for adipogenesis (3T3L1 in pre- vs. mature adipocyte). Right panel: Remodeled top 5% broadest H3K4me3 domains between pre- and mature adipocytes. See also Figure S2J.
Figure 3
Figure 3. The top 5% broadest H3K4me3 domains can be used as a discovery tool to identify new regulators of neural progenitor cells
In all panels: n.s. not significant; * p < 0.05; ** p < 0.01; *** p < 0.005 in a Wilcoxon test against control with Bonferroni correction for multiple testing. A) Experimental design for H3K4me3 ChIP-seq datasets in primary cultures of neural progenitors (NPCs) and microdissected niche (subventricular zone). B) H3K4me3 ChIP-seq peaks at a known NPC regulator in independent NPC primary cultures and in the NPC niche. Black bars: ChIP-seq peaks called by MACS2. C) Distribution of H3K4me3 ChIP-seq peaks as a function of their breadth in NPCs reveals that known NPC regulators are marked by broad H3K4me3 domains. D) Genes associated to top 30 broadest H3K4me3 domains in NPCs. Domains ranked by decreasing H3K4me3 breadth. Known regulators of NPCs in bold. RP: rank of rank product. E) Experimental design to test the role of genes marked by top 5% broadest H3K4me3 domains in NPC proliferation and neurogenesis. F) Proliferation capacity as normalized MTT optical density relative to control. Mean + SD of 2 independent experiments conducted in triplicate. Hashed blue bars: genes whose role in NPCs was discovered while this study was in preparation(Agoston et al., 2014; Ninkovic et al., 2013). See also Figure S3D. G) Proliferation capacity as percentage of infected cells that formed primary neurospheres relative to control. Mean + SD of at least 2 independent experiments conducted in triplicate. H) Images of new neurons upon Fam72a knock-down. Green: TUJ1 (neurons). Blue: DAPI (nuclei). I) Neurogenesis measured by percentage of DCX+ cells (new neurons) normalized to control. Mean + SEM of at least 2 independent experiments conducted in triplicate. See also Figure S3F.
Figure 4
Figure 4. The broadest H3K4me3 domains are characterized by a specific epigenomic signature
A) Simplified scheme of computational modeling. See also Figure S4A. B) Average classification accuracy and most important contributors associated to top 5% broadest H3K4me3 domains identified by Random Forest models in 13 cell types and organisms. Contributors for which no data was available in grey with diagonal lines. Tissue-specific transcription factors (TFs) refer to: NANOG (H1 hESCs), SMAD2/3 (H9 hESCs), STAT5 (GM12878), NANOG (mESCs), MYOG/MYOD (Myotubes), LIN-13 (C. elegans embryos). See also Figure S4B–S4E. C) Accuracy of progressive classifications in H1 hESCs and mESCs. Classifications performed between top 5% broadest H3K4me3 domains and other 5% quantile subsets along the breadth continuum. The accuracy of progressive classifications reflects the ability to discriminate domains of that quantile from top 5% broadest H3K4me3 domains. D) Breadth of H3K4me3 domains ‘with/without the top 5% broadest H3K4me3 domain signature’ in H1 hESCs and in mESCs. E–F) Example domains with/without signature in H1 hESCs. Black bars: peaks called by MACS2.
Figure 5
Figure 5. The top 5% broadest H3K4me3 domains are associated with marks of transcriptional elongation and PolII pausing
Indicated p-values for top 5% broadest H3K4me3 domain associated-genes calculated in one-sided one-sample Wilcoxon tests against expected genome-wide value from 10,000 random samplings (red dashed line). A) Differential binding of components of the elongation machinery to top 5% vs. non top 5% broadest H3K4me3 domains in mESCs. p-values from permutation test. B) Enrichments for components of the elongation machinery in mESCs expressed as a percentage of the maximal binding enrichment that can be observed along the H3K4me3 breadth continuum (see A for enrichment at top 5% broadest H3K4me3 domains). See also Figure S5A–S5E. C) Mean ChIP-seq enrichment of Total PolII in mESCs. TSS: transcription start site; TTS: transcription termination site. D) Normalized PolII ChIP-seq density over the proximal promoter and gene body. Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in one-sided Wilcoxon tests (9.6×10−10 < p < 5.4×10−3) (continued in Figure S5G). E) Mean ChIP-seq enrichment of elongating PolII (Ser2P) in mESCs. TSS: transcription start site; TTS: transcription termination site. F) Normalized elongating PolII (Ser2P) ChIP-seq density over gene bodies. Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in one-sided Wilcoxon tests (7.3×10−23 < p < 2.6×10−5) (continued in Figure S5J). G) Measure of PolII pausing. Traveling Ratio is defined as background subtracted ChIP-seq density value of PolII at the promoter vs. gene body. H) Normalized Traveling Ratios. Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in one-sided Wilcoxon tests (1.8×10−9 < p < 4.7×10−2) (continued in Figure S5K). I) Significance for increased chromatin accessibility in mESCs against expected genome-wide value shown as −log10(p-value) in one-sided Wilcoxon tests.
Figure 6
Figure 6. H3K4me3 breadth is associated with transcriptional consistency
Indicated p-values for top 5% broadest H3K4me3 domain associated-genes calculated using one-sided one-sample Wilcoxon tests against expected transcriptome-wide value from 10,000 random samplings (red dashed line). A) Transcriptional consistency/variability at the level of single cells or cell populations is defined as variance of expression levels scaled to expression levels (i.e. scaled variance). B) Transcriptional variability at the single cell level (steady state mRNA). Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in Wilcoxon tests (2.9×10−93 < p < 2.4×10−16). See also Figure S6A. C) Transcriptional variability at the cell population level (steady state mRNA). Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in Wilcoxon tests (1.5×10−170 < p < 3.9×10−3)(Continued in Figure S6B). D) Transcriptional variability at the cell population level (nascent mRNA by GRO-seq). Comparisons of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in Wilcoxon tests (1.8×10−51 < p < 4.4×10−20). See also Figure S6C. E) Experimental design for RNA-seq datasets in primary NPCs cultures. F) Transcriptional variability at the cell population level in adult NPCs (steady state mRNA). Comparison of top 5% broadest H3K4me3 domains against the rest of the distribution also significant in a Wilcoxon test (p = 3.7×10−12). G) Significance for lower transcriptional variability in adult NPCs against expected transcriptome-wide value expressed as −log10(p-value) in one-sided Wilcoxon tests.
Figure 7
Figure 7. Experimental perturbation of H3K4me3 breadth results in changes to transcriptional consistency
A) Experimental design to study the effect of knocking-down Wdr5 in primary NPC cultures. B) Western Blot analysis of NPCs treated in control (empty vector) or Wdr5 knock-down after 24h of infection. See also Figure S7B. C) Examples of H3K4me3 peaks in control (empty vector) and Wdr5 knock-down in NPCs after 24h of infection. D) Reduction of H3K4me3 breadth upon 24h Wdr5 knock-down is linked to increased transcriptional variability in NPCs. Variability was measured between 3 biological replicates at genes whose H3K4me3 domains were maintained or reduced upon Wdr5 knock-down. Red dashed line: expected transcriptome-wide value. p-values between genes with maintained vs. reduced breadth in Wilcoxon tests. E) H3K4me3 breadth remodeling upon Wdr5 knock-down and loss of transcriptional consistency in NPCs. Upper panel: −log10(p-value) in one-sided Wilcoxon test for lower transcriptional variability in control infected NPCs than expected transcriptome-wide value. Lower panel: −log10(p-value) in one-sided Wilcoxon tests for increased variability of genes losing H3K4me3 breadth vs. genes of the same original H3K4me3 quantile with maintained breadth. Red dashed line: p = 0.05. F) Examples of H3K4me3 peaks in control (scramble) and Jarid1b knock-down in mESCs after 48h of infection. G) Gain of H3K4me3 breadth upon Jarid1b knock-down is linked to increased transcriptional consistency in mESCs after 48h of infection. Variability between 3 biological replicates at genes whose H3K4me3 domains were maintained vs. extended upon Jarid1b knock-down. Red dashed line: expected transcriptome-wide value. p-values between genes with maintained or extended breadth using Wilcoxon tests. H) Summary model. Broad H3K4me3 domains extend 5′ and 3′ of TSSs and mark genes important for cell identity/function and genes with increased transcriptional consistency. Broad H3K4me3 domains may promote chromatin accessibility, thereby allowing efficient PolII loading and elongation. The mechanism responsible for the deposition of broad H3K4me3 domains is unknown but may involve tissue-specific transcription factors and the elongation machinery in a positive feedback loop. These domains may help ‘buffer’ important cell lineage/function genes against environmental fluctuation and can serve as discovery tool for such genes.

Similar articles

See all similar articles

Cited by 140 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback