Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec;25(12):1873-85.
doi: 10.1101/gr.192799.115. Epub 2015 Nov 11.

The Chromatin Environment Shapes DNA Replication Origin Organization and Defines Origin Classes

Affiliations
Free PMC article

The Chromatin Environment Shapes DNA Replication Origin Organization and Defines Origin Classes

Christelle Cayrou et al. Genome Res. .
Free PMC article

Abstract

To unveil the still-elusive nature of metazoan replication origins, we identified them genome-wide and at unprecedented high-resolution in mouse ES cells. This allowed initiation sites (IS) and initiation zones (IZ) to be differentiated. We then characterized their genetic signatures and organization and integrated these data with 43 chromatin marks and factors. Our results reveal that replication origins can be grouped into three main classes with distinct organization, chromatin environment, and sequence motifs. Class 1 contains relatively isolated, low-efficiency origins that are poor in epigenetic marks and are enriched in an asymmetric AC repeat at the initiation site. Late origins are mainly found in this class. Class 2 origins are particularly rich in enhancer elements. Class 3 origins are the most efficient and are associated with open chromatin and polycomb protein-enriched regions. The presence of Origin G-rich Repeated elements (OGRE) potentially forming G-quadruplexes (G4) was confirmed at most origins. These coincide with nucleosome-depleted regions located upstream of the initiation sites, which are associated with a labile nucleosome containing H3K64ac. These data demonstrate that specific chromatin landscapes and combinations of specific signatures regulate origin localization. They explain the frequently observed links between DNA replication and transcription. They also emphasize the plasticity of metazoan replication origins and suggest that in multicellular eukaryotes, the combination of distinct genetic features and chromatin configurations act in synergy to define and adapt the origin profile.

Figures

Figure 1.
Figure 1.
Three classes of replication origins. (A) Clustering of origins based on read densities around peaks. The left panel displays a heatmap of read densities in 7-kb regions on each side of the peak summit. It shows how an IS is positioned relative to its neighbors and indicates the signal strength (number of reads) and density at each IS. The brown intensity is proportional to the read counts per 100-bp bins. The numbers on the right of the heatmap indicate groups obtained by k-means clustering, and the left/right symmetry between cluster pairs is denoted by identical numbers followed by the L and R suffixes. The three classes of origins defined in the text are highlighted in boxes. The right panel indicates the overlap of IS (± 500 bp from the summit) with CpG islands (blue), promoters (red), and genes (black). (B) Read density mean profiles per class of origins. Each class is defined by assembling groups of IS characterized by a specific distance (dotted vertical lines) between two major IS, except for Class 1 origins, which have a single IS within a 14-kb region. The y-axis represents the average number of reads per peak. Class 1 profile is represented by subgroup 1; Class 2 by subgroups 5LR, 6LR; Class 3a by subgroups 8LR, 9LR; and Class 3b by 10LR, 11. (C) Genomic localization of origins. For each class, the bar plot indicates the proportion of IS associated with intergenic, CGi, promoters, and exons. (D) Distribution of late origins per class. The percentage of IS overlapping with the late-replicating regions defined in Hiratani et al. (2008) (Supplemental Table S1) is indicated for each class.
Figure 2.
Figure 2.
Epigenetic marks and chromatin environment at IS. (A,B) Specific chromatin marks/factors associated with open chromatin-2 (blue boxes) and Polycomb complexes (brown boxes) at initiation sites. (A) Distribution of chromatin marks/factors relative to IS (dark green) and IZ (light green). (B) ChIP-seq signals for H3K4me3, H3K9ac, H3K27me3, and SUZ12 around ±7 kb from IS. (C) Hierarchical clustering of Pearson correlations between pairs of marks and/or chromatin factors for all IS. Marks and factors in the heatmap are organized according to the clustering described in Methods. Positive correlations are symbolized by a gradation of red that results from the localization of pairs of marks/factors at individual IS. Negative correlations are symbolized by a gradation of blue. Four significant groups of chromatin mark/factor are highlighted on the right: Open 1 chromatin marks/factors group (mostly associated with transcription initiation); Open 2 chromatin marks group (globally linked to decondensed chromatin); Enhancer marks (Enh.); and Polycomb group mark/factors (PcG). (D) More than 80% of H3K27me3 marks at IZ are associated with PRC1 or PRC2 proteins or are inside bivalent domains. (E) Overlap between marks/factors and origins.
Figure 3.
Figure 3.
Epigenetic marks and chromatin factors at IZ. (A,B) Specific chromatin marks/factors associated with open chromatin-1 (light blue boxes) and enhancers (gray boxes) at initiation zones. (A) Distribution of chromatin marks/factors relative to IS (dark green) and IZ (light green). (B) ChIP-seq signals for H3K27ac, RNA Pol II, H3K4me1, and 5hmC around ±7 kb from the IS. (C) Hierarchical clustering of Pearson correlations between pairs of marks and/or chromatin factors at IZ. Marks and factors in the heatmap are ordered according to the clustering performed for the IS (see Fig. 2C). Positive correlations between pairs of marks/factors in individual IZ regions are symbolized by a gradation of red. Negative correlations are symbolized by a gradation of blue. (D) Percent overlap between poised (H3K4me1/H3K27me3) and active (H3K4me1/H3K27ac) enhancers within IZ. (E) Overlap between marks/factors and origins.
Figure 4.
Figure 4.
Origin classes are linked to specific chromatin signatures. (A) Association of chromatin marks/factors within each class. Pearson correlation between pairs of chromatin marks/factors for all IS in each class. Class 1 IS (n = 28,443) are correlated with closed chromatin marks and anticorrelated with all other marks/factors. Class 2 IS (n = 14,873) are the only IS positively correlated with enhancer features (Enh.), whereas Class 3a and 3b IS (n = 10,547) are associated with Open 2 and Polycomb group (PcG) chromatin marks/factors. (B) ChIP-seq signals for 5hmC, H3K27me3, SUZ12, and H3K4me3 around ±7 kb from the center of Class 2 and Class 3a IS.
Figure 5.
Figure 5.
Sequence motifs at origins. (A,B) Distribution of OGRE sites around IS in mouse (A) and human (B) ES cells. PWM-based scanning of 1-kb regions on both sides of origin summits to detect OGRE motif instances. The x-axis indicates the position relative to the origin summit, the y-axis the number of predicted OGRE sites per 50-bp window, for the 65,019 mouse origins (A), and 149,791 human origins (B), respectively. (C) Clustering of oligonucleotide occurrence profiles. The left panel shows the hierarchical clustering of positional profiles of 8-mer occurrences in 50-bp non-overlapping windows over 1 kb on each side of peak summits. Each row corresponds to one specific k-mer. The color scale indicates the per k-mer normalized frequency. Local over- or underrepresentation is denoted by red or green hues, respectively. The right panels show detailed examples of individual position profiles for two clusters of k-mers. Colored bold lines represent the median profile of normalized frequencies for the entire cluster. The inset sequence logos were built by scanning sequences with matrices resulting from k-mer assemblies. (D) Profiles of origin-associated motifs. Distribution profiles of origin-associated motifs, obtained by scanning origin regions with the matrices identified in the previous step (C). The y-axis indicates the number of motif occurrences within each bin (300 bp). We combined all profiles in four different groups (G-rich, C-rich, GT/AC, and T/A). Sequence logos of the origin-associated motifs are displayed beside the corresponding profiles. (E) Proportion of origins with the representative motifs. Each origin containing at least one occurrence of the motif representative of each group is reported. The occurrence must be within the most enriched part within the origin (e.g., between +150 and +450 bp from the IS for cluster 2). (F) Proportion of origins positive for a specific motif per class. Origins containing at least one of the representative 8-mers are assigned according to their class. G/C-rich motif-positive origins are in Class 1, 2, and 3a/b (left), whereas origins enriched in AC or TG repeats are mainly in Class 1 (right).
Figure 6.
Figure 6.
A labile nucleosome is present at the replication initiation site. (A) Nucleosome occupancy at origins. ChIP-seq signals for two different MNase digestion sets at ±1.5 kb around the IS (solid and dashed black lines). The global profile of origins centered on IS is in green. The two MNase profiles are matching, except on the IS, indicating a labile nucleosome. The histone H3 ChIP-seq signal is in red. (B) Distribution of histone H3 variants at origins. ChIP-seq signals for H3.1 (large dashed black line), H3.2 (medium dashed black line), H3.3 (fine dashed black line), and H2AZ (red line) at ±1.5 kb of IS (global profile in green). (C) Distribution of H3K64ac around IS. ChIP-chip signals for H3K64ac (50% top probes distribution) at ±1.5 kb of IS (global profile in green). (D,E) Nucleosome distribution around IS according to the G-rich sequence asymmetry. (D) A group of origins with G-rich asymmetry with initiation sites that show a highly significant enrichment (P-value 1 × 10−2) in either G-rich k-mers on the left side of the initiation sites (MNase-strand+, blue line, n = 9518) or in C-rich k-mers on the right side (MNase-strand−, red line, n = 9682). We thus oriented the right-side C-rich plot to systematically represent the features in the 5′ to 3′ direction. Note the striking consistency between MNase profiles obtained from left-side G-rich and right-side C-rich occurrences (MNase-merge, black line). The orange curve indicates the profile of G-rich occurrences around the IS considering the DNA strand on origins oriented according to the asymmetry of G-rich k-mer occurrences. (E) A group of nonasymmetric origins (n = 45,320) was defined by initiation sites with no significant enrichment for either of these two signals. Note the acute peak of MNase at the precise position of the IS and the absence of upstream depletion, which contrasts with G-rich asymmetric origins. (F) Schematic representation of the nucleosome distribution around IS. (G) Genome browser representations of Nascent Strands, MNase1, and MNase2 enrichment profiles are shown for a representative G-rich asymmetric origin.

Similar articles

See all similar articles

Cited by 43 articles

See all "Cited by" articles

Publication types

MeSH terms

Associated data

LinkOut - more resources

Feedback