Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov;21(11):1822-32.
doi: 10.1101/gr.124644.111. Epub 2011 Aug 3.

Genome-wide Depletion of Replication Initiation Events in Highly Transcribed Regions

Affiliations
Free PMC article

Genome-wide Depletion of Replication Initiation Events in Highly Transcribed Regions

Melvenia M Martin et al. Genome Res. .
Free PMC article

Abstract

This report investigates the mechanisms by which mammalian cells coordinate DNA replication with transcription and chromatin assembly. In yeast, DNA replication initiates within nucleosome-free regions, but studies in mammalian cells have not revealed a similar relationship. Here, we have used genome-wide massively parallel sequencing to map replication initiation events, thereby creating a database of all replication initiation sites within nonrepetitive DNA in two human cell lines. Mining this database revealed that genomic regions transcribed at moderate levels were generally associated with high replication initiation frequency. In genomic regions with high rates of transcription, very few replication initiation events were detected. High-resolution mapping of replication initiation sites showed that replication initiation events were absent from transcription start sites but were highly enriched in adjacent, downstream sequences. Methylation of CpG sequences strongly affected the location of replication initiation events, whereas histone modifications had minimal effects. These observations suggest that high levels of transcription interfere with formation of pre-replication protein complexes. Data presented here identify replication initiation sites throughout the genome, providing a foundation for further analyses of DNA-replication dynamics and cell-cycle progression.

Figures

Figure 1.
Figure 1.
Sample replication initiation profiles obtained through massively parallel sequencing of nascent DNA strands. A chromosome map is shown at the top, and the region-of-interest is delineated by a black rectangle. The analyzed region is shown underneath the ideogram, with map coordinates indicated. The experimental tracks (MCF7 or K562 nascent strand [NS]/genome ratios) show the distribution of sequence reads (aligned with the indicated region) obtained from massively parallel sequencing of nascent strands either from MCF7 breast cancer cells or from K562 erythroleukemia cells. All data are shown as the ratio of reads obtained from a nascent strand preparation and reads obtained from a corresponding control genomic DNA preparation. For each track, the y-axis indicates the nascent strand/genomic DNA ratio. Reads were calculated as reads per kilobase per million mapped reads (RPKM); for details, see Supplemental Information. RefSeq genes are aligned under the nascent strand distribution. For RefSeq genes, thick boxes represent exons, whereas thin lines represent introns and untranslated regions. Arrows on RefSeq genes indicate the direction of transcription. Initiation at select sites was verified using real-time PCR, with primers listed in Table 2. Examples of control and nascent strand tracks are shown in Supplemental Figure 1A. (A–C) Mapping replication initiation events at previously characterized replication origins. (A) Data from the MYC locus (human chromosome 8). Replication initiation sites were mapped to the region spanning the promoter to the first exon of the MYC gene. (B) Data from the human beta globin locus (HBB) (human chromosome 11). Replication initiation sites were mapped to the region stretching from the promoter to the first intron of the HBB gene. (C) Data from the HPRT1 gene on the X chromosome. Replication initiation sites were mapped near the HPRT1 promoter. (D) Data from the CTCF locus (chromosome 16 q22.1). This region does not contain a known replication origin. Initiation from gene promoters and from the RLTPR gene region (3′ of the CTCF gene) was verified using real-time PCR on an independent preparation of nascent strands (data not shown).
Figure 2.
Figure 2.
Replication and transcription. (A,B) Replication enrichment ratios (nascent strand [NS] versus genomic control RPKM) for all identified genes in MCF7 cells plotted against log2 GCRMA normalized gene expression. Genes on the x-axis were binned according to gene expression, with each bin containing 897 genes. The first column, however, represents a combination of the first five bins, which includes 4485 low-expressing genes that did not show significant differences in log2 GCRMA normalized gene expression. (A) The mean enrichment ratio for each bin is plotted against gene expression. (B) Replication enrichment ratios calculated as in A, showing the distribution of enrichment ratio values as a box plot. (C,D) Replication enrichment ratios calculated as in A and B for K562 cells plotted as a histogram of mean values (C) or as a box plot (D). For mean value histograms (A,C), asterisks represent statistically significant (P < 0.001) divergence from the central bin, which is marked with an arrow. For box plots (B,D), boxes indicate distributions of the second and third quartiles; dots indicate mean values; error bars indicate the fifth and 95th percentiles. The horizontal line in A and C represents the average enrichment ratio of the entire genome.
Figure 3.
Figure 3.
Replication initiation depletion at the transcription start site (TSS) in transcribed genes. (A) Average replication enrichment ratio in MCF7 cells (calculated as in Fig. 2) plotted against distance from the TSS for all known genes. (B) Distribution of replication enrichment ratios in MCF7 cells for groups of genes that exhibit different levels of expression. Levels include the following: very low (log2 GCRMA <2.3), low (log2 GCRMA 2.3–5.3), medium (log2 GCRMA 5.3–8.5), and high (log2 GCRMA >8.5). (C,D) The same analyses are shown for K562 cells.
Figure 4.
Figure 4.
Replication initiation in discordantly expressed genes. We calculated replication enrichment ratios (nascent strands RPKM vs. control genomic DNA RPKM) for genes whose level of expression differed significantly between K562 and MCF7 cell types. Genes with log2 GCRMA values <4.3 in MCF7 and >6.3 in K562 were considered MCF7-low and K562-high (MCF7_L:K562_H); genes with log2 GCRMA values <4.3 in K562 and >6.3 in MCF7 were considered MCF7-high and K562-low (MCF7_H:K562_L). (A) Distribution of nascent strand enrichment ratios in MCF7 cells for MCF7_H:K562_L genes. For comparison, enrichment ratio plots for MCF7_H and MCF7_L genes are shown. (B) Distribution of enrichment ratios in K562 cells for MCF7_H:K562_L genes. For comparison, enrichment ratio plots for K562_H and K562_L genes are shown. (C) Distribution of nascent strand enrichment ratios in MCF7 cells for MCF7_L:K562_H genes. For comparison, enrichment ratio plots for MCF7_H and MCF7_L genes are shown. (D) Distribution of enrichment ratios in K562 cells for MCF7_L:K562_H genes. For comparison, enrichment ratio plots for K562_H and K562_L genes are shown.
Figure 5.
Figure 5.
Chromatin modifications and replication initiation events. The average nascent strand versus genomic DNA ratio (calculated as described for Fig. 2) is shown for genomic regions that contain the indicated chromatin modification features. Modified regions were identified using data from the UCSC Genome Browser (Table 3). Mean NS/control RPKM ratios (dot), and corresponding SDs (error bars), are shown. The horizontal line indicates a matched random control. Chromatin modification features are sorted according to the level of enrichment for replication initiation frequency. Statistically significant deviations from replication ratios of the entire genome (P < 0.001) were observed for all modifications except c-Jun and SIRT6. For intersections between chromatin features (regions that exhibit combinations of chromatin modifications), please see Supplemental Information.
Figure 6.
Figure 6.
Effect of CpG methylation on the frequency of replication initiation events and gene expression. The level of CpG methylation for MCF7 cells (A) and K562 cells (B) is plotted against gene expression levels (log2 GCRMA, gray histograms) and replication enrichment ratio (nascent strands vs. genomic control, red histograms). For box plots of the data , please see Supplemental Information.

Similar articles

See all similar articles

Cited by 63 articles

See all "Cited by" articles

Publication types

Associated data

LinkOut - more resources

Feedback