Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep;48(9):984-94.
doi: 10.1038/ng.3616. Epub 2016 Jul 25.

Principles for RNA Metabolism and Alternative Transcription Initiation Within Closely Spaced Promoters

Free PMC article

Principles for RNA Metabolism and Alternative Transcription Initiation Within Closely Spaced Promoters

Yun Chen et al. Nat Genet. .
Free PMC article


Mammalian transcriptomes are complex and formed by extensive promoter activity. In addition, gene promoters are largely divergent and initiate transcription of reverse-oriented promoter upstream transcripts (PROMPTs). Although PROMPTs are commonly terminated early, influenced by polyadenylation sites, promoters often cluster so that the divergent activity of one might impact another. Here we found that the distance between promoters strongly correlates with the expression, stability and length of their associated PROMPTs. Adjacent promoters driving divergent mRNA transcription support PROMPT formation, but owing to polyadenylation site constraints, these transcripts tend to spread into the neighboring mRNA on the same strand. This mechanism to derive new alternative mRNA transcription start sites (TSSs) is also evident at closely spaced promoters supporting convergent mRNA transcription. We suggest that basic building blocks of divergently transcribed core promoter pairs, in combination with the wealth of TSSs in mammalian genomes, provide a framework with which evolution shapes transcriptomes.

Conflict of interest statement

The authors declare no competing interests.


Figure 1
Figure 1. A general building block for transcription initiation
a-c: Definitions of promoter, core promoter and TSS as in,,. A divergent promoter is defined as a TSS-encompassing nucleosome-deficient region (NDR) supporting transcription initiation from oppositely oriented core promoters. Such a general building block may produce pairs of promoter upstream transcripts (PROMPT)-mRNA (a), enhancer RNA (eRNA)-eRNA (b) or mRNA-mRNA (c). Note that eRNA-eRNA blocks are tentatively termed ‘promoters’ since they can initiate transcription. Forward (blue) and reverse (red) strands are defined as indicated. Sequence properties downstream of respective core promoters are indicated as callouts; pA: polyadenylation, 5′SS: 5′splice site. d-f: Schematic representation of distinct promoter combinations analyzed in this study. Strands are defined as in (a-c). d: Divergent head-to-head configuration of mRNA-PROMPT promoters. e: Convergent head-to-head configuration of mRNA-PROMPT promoters. f: As (e), but with one un-annotated promoter.
Figure 2
Figure 2. Common organization of divergent RNA-RNA TSS pairs
a: Heat maps showing forward (blue) and reverse (red) strand Cap Analysis of Gene Expression signal following RRP40 depletion (CAGE-RRP40) at TSSs of the RNA classes schematized on top of each map: PROMPT-mRNA (N=1,097), eRNA-eRNA (N=1,288) and mRNA-mRNA (N=663). Rows correspond to TSS pairs centered on the midpoint between the two TSSs, sorted by increasing TSS-TSS distance. The most prevalent TSS positions are marked with dashed black lines. X axes show distances in bp from the midpoint (‘0’). Dashed horizontal lines indicate the distance between TSSs; numbers of pairs in each distance group is shown on the right. Insets show distributions of TSS-TSS distances. Color scales show log2 signal intensities on respective strands. Non-logged minimal and maximal plotted values are indicated. White color indicates no mapped reads. b: Heat maps organized as in (a), showing nascent RNA 3′ends from native elongating transcript sequencing (NET-seq) data. c. Heat maps organized as in (a), showing DNase hypersensitivity data. d: Heat maps organized as in (a), showing H3K27ac chromatin immuno-precipitation (ChIP)-seq data. e: Heat maps organized as in (a), showing TFIIB (left) and TBP (right) ChIP-exo data from K562 cells for mRNA-mRNA TSS pairs. f: Heat map organized as in (a), showing K562 RNAPII ChIP-exo data. g: Heat map organized as in (a), showing TSS-associated (TSSa) RNAs inferred by small RNA-seq reads. Inset shows cross-correlation between TSSa RNA 3′ends and NET-seq signals from mRNA-mRNA TSS pairs. The number of analyzed regions is indicated.
Figure 3
Figure 3. PROMPT generation and properties between divergent mRNA TSSs
a: Incidence and exosome sensitivity of PROMPTs between mRNA TSSs. Top panel: Cumulative CAGE-RRP40 (black) and CAGE-ctrl (grey) TPM/bp signals falling into PROMPT transcription initiation regions of mRNA-mRNA pairs plotted over increasing TSS-TSS distances. 300bp and 1,000bp boundaries indicate where PROMPTs change properties. Bottom panel: Average TPM/bp signals from libraries and regions in (a). Error bars show 95% confidence intervals. Forward and reverse strand signals are merged. b: Heat maps of reads from transcript isoform sequencing, from RRP40 and ZCCHC8-depleted cells (TIF-seq-RRP40+ZCCHC8), initiating within PROMPT transcription initiation regions of forward (left panel) and reverse (right panel) mRNAs, organized as in Fig. 2a. Schematics on top indicate the analyzed PROMPTs in black. c: PROMPT length distributions measured by TIF-seq-RRP40+ZCCHC8 split by mRNA TSS-TSS distances. d: NET-seq enrichment plot. Y-axis shows log2 average NET-seq signals in a sliding 201bp window downstream of the TSSs of the indicated RNA subtypes, normalized to the signals within a +/- 100bp region around the respective TSSs, as illustrated by schematic on top, with 95% confidence intervals. X-axis shows distances from the respective TSSs. e: Fraction of regions between mRNA TSS pairs with ≥1 predicted polyadenylation (pA) site or 5′ splice site (5′SS) divided by the equivalent fraction from non-genic background, log2-scaled. f: Occurrences of predicted pA sites and 5′SSs within 1kb regions downstream of TSSs of the indicated divergent mRNAs or of their respective PROMPTs. Y-axes show the cumulative fraction of regions having at least one predicted site at or before the indicated distance from the respective TSS (X-axes). For all figures, the numbers of analyzed features are indicated and P values indicate two sided Mann-Whitney tests.
Figure 4
Figure 4. Organization of TSS pairs forming NAT and nNAT constellations
a: Schematic overview of analyzed constellations: Annotated natural antisense transcripts (NATs) (left panel) and novel natural antisense transcripts (nNATs) (right panel), with their respective NAT- and nNAT-host mRNAs. Forward strand transcripts, defined by the orientation of the host mRNA strand, are colored blue. Reverse strand transcripts are red. NATs, nNATs and their respective host TSSs are associated with their own PROMPTs, with the indicated nomenclature. The distance (d) between host mRNA- and NAT/nNAT-TSSs is indicated by horizontal tick marks in the heat maps below. b: Heat maps showing forward (blue) and reverse (red) strand CAGE-RRP40 data at NAT (left panel) and nNAT (right panel) constellations centered on the host mRNA TSS and ordered by increasing d. X-axes indicate the distance from the host mRNA TSS in bp. Y axes rows show individual TSS pairs. CAGE-defined host mRNA and NAT/nNAT TSS positions are marked with dashed black lines. Numbers of analyzed regions split by d are indicated on left and right sides, respectively. c: Heat maps organized as in (b), showing GC content. GC-rich regions are indicated. d: Heat maps organized as in (b), showing DNase sensitivity data. NDR locations are indicated. e: Heat maps organized as in (b), showing ENCODE H3K4me3 ChIP-seq data. NDR locations are indicated. f: Heat maps organized as in (b), showing ENCODE H3K4me1 ChIP-seq data.
Figure 5
Figure 5. Properties of NATs and nNATs
a: Heat maps of NAT (left) and nNAT (right) constellations as in Fig. 4b, but split up by reverse and forward (left and right half-panels, respectively) strands and showing log2 CAGE-RRP40/-ctrl ratios. Schematics on top show transcript configurations within constellations; yellow dashed lines indicate NAT/nNAT host mRNA and NAT/nNAT TSSs. b: Average log2 CAGE-RRP40/-ctrl ratios of transcripts from NAT and nNAT constellations shown as bar plots, split up by transcript type. Error bars indicate 95% confidence intervals of means. c: Length and termini distributions of NATs and nNATs. Left panel: Distributions of log10 RNA lengths inferred by TIF-seq-RRP40+ZCCHC8 data, split by transcript type as in (b). Mid panels: Heat maps of TIF-seq-RRP40+ZCCHC8-derived reads initiating at NAT (top) or nNAT (bottom) TSSs, organized as in (a). Right panels: Bar plots showing the number of TIF-seq-RRP40+ZCCHC8 derived 3′ends appearing before the host mRNA TSS (black), within the host mRNA PROMPT territory (blue) or further upstream (pink), defined as in the bottom schematics. Bar plots are split by TSS-TSS distances as indicated. Grey/white areas indicate the regions in the heat maps that are analyzed in the bar plots. d: Relation between host mRNA and NAT/nNAT levels. Y-axis shows levels (log2 CAGE-RRP40 TPM) of host mRNAs, split by levels of NAT (left) or nNAT (right) expression. Boxplots edges and midpoints correspond to 1st, median and 3rd quartile, whiskers extend to the most extreme data point ≤ 1.5×interquartile range from respective box edge. e: Relation between host mRNA levels and NAT/nNAT ‘traversal’ of the host mRNA TSS. Y-axis shows log2 CAGE-RRP40 TPM signal of host mRNAs, split by transcript type. All NATs considered traversed their host mRNA TSSs; nNATs were spilt depending on whether their 3′ends fell before or after the host mRNA TSS. Boxplots defined as in d. For b, c and d, the numbers of analyzed features are indicated, and P-values indicate Mann-Whitney two-sided tests.
Figure 6
Figure 6. PROMPT properties within convergent constellations
a: Exosome sensitivities of PROMPTs within NAT/nNAT constellations. Boxplots show log2 CAGE-RRP40/-ctrl ratios of indicated PROMPTs schematized in Fig. 4a (left panel) or split by mRNA-NAT/nNAT TSS-TSS distance (right panel). b: Length distributions of PROMPTs within NAT/nNAT constellations. Top: Distributions of RNA lengths (TIF-seq-RRP40+ZCCHC8 reads) split by transcript type as in (a). Bottom: Heat maps organized as in Fig. 5c, showing TIF-seq-RRP40+ZCCHC8 reads initiating at the indicated PROMPTs. Dashed lines indicate CAGE-defined NAT/nNAT host mRNA- and NAT/nNAT TSSs. Numbers on Y-axes indicate the distance (d) between host mRNA and NAT/nNAT TSSs. c: Relation between PROMPT length and distance between host mRNA and NAT (left) or nNAT (right) TSSs. Boxplots, defined as in Fig 5d, show distributions of indicated PROMPT lengths (TIF-seq-RRP40+ZCCHC8 reads), split by mRNA-NAT/nNAT TSS-TSS distance. P-values indicate two-sided Mann-Whitney tests. d: Overlap between PROMPT- and annotated-TSSs. Bar plots display fractions of NAT/nNAT PROMPT TSSs and their host mRNA PROMPT TSSs whose ±100bp flanking regions overlap with annotated TSSs on the same strand, split by mRNA-NAT/nNAT TSS-TSS distance. Fractions across all relevant PROMPT TSSs are indicated. e: Overlap between PROMPT 3′ends (TIF-seq-RRP40+ZCCHC8 reads), and 3’ends of corresponding upstream annotated mRNA, split by mRNA-NAT/nNAT TSS-TSS distance. f: Occurrence of pA sites (black) and 5′SSs (dark red) within 5kb regions downstream of TSSs of mRNAs >5kb. Y-axis: average predicted sites/bp, smoothed by a moving 100bp window. X-axis: distance from the mRNA TSS. Non-genic background site/bp densities are indicated. For a-e, the numbers of analyzed features are indicated.
Figure 7
Figure 7. Models for PROMPT- and alternative TSS-generation within bidirectional constellations
a: PROMPT generation at divergently transcribed mRNA TSSs. Closely spaced divergent TSSs (≤300bp) produce no PROMPTs in the shared NDR region (top panel). As the distance increases (301bp-1kb), two NDRs appear, supporting transcription initiation of both mRNAs and PROMPTs (mid panel). These PROMPTs are exosome-insensitive and often span the next NDR to the downstream mRNA 3′end, producing alternative RNA isoforms for that gene. When mRNA TSSs are separated by >1kb (bottom panel), two canonical mRNA-PROMPT pairs appear. b: PROMPT generation at convergently transcribed TSSs. Convergent TSSs (mRNAs vs. NATs/nNATs) derive from individual NDRs, which emit PROMPTs. When nNATs/NATs are proximal to the host mRNA TSS (top and mid panel), their PROMPTs are long and exosome-insensitive. These PROMPT TSSs may become alternative TSSs for the host mRNA. Proximal NATs (mid panel) exert similar constrains on the NAT mRNA host PROMPTs, which may become alternative RNA isoforms for the NAT. This configuration is similar to that of proximal divergent mRNA TSSs (see double-headed arrow). As convergent TSSs are further separated (bottom panel, only commonly occurring for nNATs), nNATs and their PROMPTs become shorter and exosome-sensitive, as they are not influenced by sequence constraints of nearby mRNA TSS regions.

Comment in

Similar articles

See all similar articles

Cited by 25 articles

See all "Cited by" articles


    1. Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. - PubMed
    1. Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nat Genet. 2009;41:572–578. - PubMed
    1. Preker P, et al. RNA Exosome Depletion Reveals Transcription Upstream of Active Human Promoters. Science. 2008;322:1851–1854. - PubMed
    1. Core LJ, Waterfall JJ, Lis JT. Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters. Science. 2008;322:1845–1848. - PMC - PubMed
    1. Seila AC, et al. Divergent transcription from active promoters. Science. 2008;322:1849–1851. - PMC - PubMed

Publication types