Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 1;35(7):706-23.
doi: 10.15252/embj.201592759. Epub 2016 Feb 19.

Upstream ORFs Are Prevalent Translational Repressors in Vertebrates

Affiliations
Free PMC article

Upstream ORFs Are Prevalent Translational Repressors in Vertebrates

Timothy G Johnstone et al. EMBO J. .
Free PMC article

Abstract

Regulation of gene expression is fundamental in establishing cellular diversity and a target of natural selection. Untranslated mRNA regions (UTRs) are key mediators of post-transcriptional regulation. Previous studies have predicted thousands of ORFs in 5'UTRs, the vast majority of which have unknown function. Here, we present a systematic analysis of the translation and function of upstream open reading frames (uORFs) across vertebrates. Using high-resolution ribosome footprinting, we find that (i)uORFs are prevalent within vertebrate transcriptomes, (ii) the majority show signatures of active translation, and (iii)uORFs act as potent regulators of translation and RNA levels, with a similar magnitude to miRNAs. Reporter experiments reveal clear repression of downstream translation by uORFs/oORFs. uORF number, intercistronic distance, overlap with the CDS, and initiation context most strongly influence translation. Evolution has targeted these features to favor uORFs amenable to regulation over constitutively repressive uORFs/oORFs. Finally, we observe that the regulatory potential of uORFs on individual genes is conserved across species. These results provide insight into the regulatory code within mRNA leader sequences and their capacity to modulate translation across vertebrates.

Keywords: gene regulation; ribosome profiling; translation; uORFs.

Figures

Figure 1
Figure 1. uORFs are widespread and translated during zebrafish development

Classification of the protein‐coding transcriptome in zebrafish, human, and mouse reveals that uORFs are widespread and translated. Transcripts containing at least one uORF are marked in purple, transcripts containing no uORFs but at least one oORF are marked in orange, and transcripts lacking both are gray. In zebrafish, three different translation thresholds were applied to classify translated uORFs, and each transcript then classified by its highest confidence uORF: low confidence (dark pink): RPF RPKM > 0; medium confidence (light purple): ORFscore > 0; and high confidence (dark purple): ORFscore > 6.044.

uORFs and oORFs are widespread throughout the embryonic transcriptome, with a majority of oORF‐containing transcripts also containing at least one uORF. uORF‐containing (purple) and oORF‐containing (orange) transcripts were counted in mouse, human, and zebrafish, and the overlap is shown by Venn diagrams.

Metagene analysis reveals features of active translation in uORFs classified as translated. Metagene plots display normalized ribosome‐protected fragment density surrounding uORF start and stop codons, colored according to the frame relative to the ORF being translated. CDS regions with and without uORFs are also shown for comparison. Note the clear phasing of ribosome‐protected fragments within high‐ and medium‐confidence uORFs, and the characteristic start and stop RPF peaks across all classes of uORFs.

Ribosome profiling reveals in‐frame translation of uORFs/oORFs in key developmental regulators. RPF‐line plots show the positional distribution of 28 and 29 nt RPFs (above axes) and mRNA‐seq reads (below axes) in the whole gene (below) and first 300 nt (inset above) of Nanog (D), POU5F3 (E), and Smad7 (F). All putative ORFs (Distal AUG‐Stop) are colored by respective frame (blue, pink and green boxes), as are reads according to their P‐site. Note the agreement between ORF color and RPF color, consistent with a strong in‐frame distribution of reads within individual transcripts.

Figure 2
Figure 2. uORFs act repressively in vertebrate development

Most uORFs are not conserved at the peptide sequence level. Pie charts depict coding potential (phyloCSF score) of (A) all potential uORFs and (B) translated uORFs. uORFs with a phyloCSF score ≥ 50 were considered conserved, uORFs were considered weakly conserved if their phyloCSF score was positive but less than the conservation threshold of 50.

Translated uORFs are enriched in conserved peptides. Enrichment plot indicates log‐odds ratio of conserved uORFs in the set of translated uORFs versus all uORFs.

uORF‐containing transcripts are translationally repressed at 5 hpf. Cumulative distribution of translation efficiency in expressed (> 0.5 RPKM) uORF‐containing transcripts versus transcripts lacking uORFs. Transcripts containing oORFs are excluded from this plot. Control transcripts (0 uORFs) have a coding CDS (Global ORFscore > 6.044) but no uORF in their 5′ TL. Two‐sided Wilcoxon P‐values are provided for each uORF set compared to the control.

Translation is significantly repressed in oORF‐containing transcripts. Cumulative distribution of translation efficiency at 5 hpf in expressed (> 0.5 RPKM) oORF‐containing transcripts versus transcripts lacking oORFs. Transcripts containing uORFs are excluded from this set. Control transcripts (0 oORFs) have a coding CDS (Global ORFscore > 6.044) but no uORF in their TLS. Two‐sided Wilcoxon P‐value is provided for the oORF set compared to the control.

miR‐430 is a widespread developmental translation repressor. Cumulative distribution of translation efficiency at 5 hpf in expressed (> 0.5 RPKM) miR‐430 site‐containing transcripts (single or multiple 7/8‐mers) versus transcripts which lack a miR‐430 site in their 3′ UTR. Two‐sided Wilcoxon P‐value is provided for the miR‐430 set compared to the control.

uORFs are associated with lower RNA levels. Cumulative distribution of translation efficiency at 5 hpf in expressed (> 0.5 RPKM) uORF‐containing transcripts versus transcripts lacking uORFs. Transcripts containing oORFs are excluded from this plot. Control transcripts (0 uORFs) have a coding CDS (Global ORFscore > 6.044) but no uORF in their 5′ UTR. Two‐sided Wilcoxon P‐values are provided for each uORF set compared to the control.

oORFs are associated with lower RNA levels. Cumulative distribution of translation efficiency at 5 hpf in expressed (> 0.5 RPKM) oORF‐containing transcripts versus transcripts lacking oORFs. Transcripts containing uORFs are excluded from this set. Control transcripts (0 oORFs) have a coding CDS (Global ORFscore > 6.044) but no uORF in their 5′ UTR. Two‐sided Wilcoxon P‐value is provided for the oORF set compared to the control.

miR‐430 targets RNAs for degradation by 5 hpf. Cumulative distribution of translation efficiency at 5 hpf in expressed (> 0.5 RPKM) miR‐430 site‐containing transcripts (single or multiple 7/8‐mers) versus transcripts which lack a miR‐430 site in their 3′ UTR. Two‐sided Wilcoxon P‐value is provided for the miR‐430 set compared to the control.

Figure EV1
Figure EV1. Conservation of uORFs in mammals

Pie charts depict coding potential (phyloCSF score) of (A) human uORFs and (B) mouse uORFs. uORFs with a phyloCSF score ≥ 50 were considered conserved, and uORFs were considered weakly conserved if their phyloCSF score was positive but less than the conservation threshold of 50.

Legend displaying the color codes for various types of substitutions in the representative multiple alignments. Amino acid substitutions are considered conservative if they have a positive BLOSUM62 score.

Representative human uORF alignments (across 29 mammals) are shown for a (D) conserved uORF, (E) weakly conserved uORF, and (F) non‐conserved uORF. Representative zebrafish uORF alignments (across 5 teleosts) are shown for a (G) conserved uORF, (H) weakly conserved uORF, and (I) non‐conserved uORF.

Figure EV2
Figure EV2. uORF and oORF presence repress translation across zebrafish development (additional time points)

Cumulative plots show translation efficiency of CDSs with varying numbers of uORFs for expressed transcripts at (A) 2 hpf, (B) 12 hpf, (C) 24 hpf, and (D) 48 hpf. P‐values are calculated versus transcripts lacking uORFs using a Wilcoxon rank‐sum test with continuity correction.

Additional cumulative plots show translation efficiency of CDSs with or without oORFs for expressed transcripts at (E) 2 hpf, (F) 12 hpf, (G) 24 hpf, and (H) 48 hpf.

Further cumulative plots display translation efficiency of CDSs (I) or RNA expression (J) at 5 hpf in zebrafish transcripts with at least one high‐, medium‐, or low‐confidence uORF versus a control set lacking uORFs.

Figure 3
Figure 3. uORFs and oORFs regulate translation in mammals

uORF‐containing transcripts are repressed in HeLa cells. Plot displays the cumulative distribution of translation efficiency in expressed (> 0.5 RPKM) transcripts containing 1, 2, or > 2 uORFs versus transcripts lacking uORFs. Transcripts containing oORFs are excluded from this set. Two‐sided Wilcoxon P‐values are provided for each uORF set compared to the control.

oORF‐containing transcripts are repressed in HeLa cells. Plot displays the cumulative distribution of translation efficiency in expressed (> 0.5 RPKM) oORF‐containing transcripts versus transcripts lacking oORFs. Transcripts containing uORFs are excluded from this set. Two‐sided Wilcoxon P‐value is provided for the oORF set compared to the control.

uORF‐containing transcripts are repressed in murine embryonic stem cells. Plot displays the cumulative distribution of translation efficiency in expressed (> 0.5 RPKM) transcripts containing 1, 2, or > 2 uORFs versus transcripts lacking uORFs. Transcripts containing oORFs are excluded from this set. Two‐sided Wilcoxon P‐values are provided for each uORF set compared to the control.

oORF‐containing transcripts are repressed in murine embryonic stem cells. Plot displays the cumulative distribution of translation efficiency in expressed (> 0.5 RPKM) oORF‐containing transcripts versus transcripts lacking oORFs. Transcripts containing uORFs are excluded from this set. Two‐sided Wilcoxon P‐value is provided for the oORF set compared to the control.

uORF translation is correlated with CDS repression. Scatterplot displays the per‐sample mean repression of uORF‐containing transcripts versus the mean translation efficiency of uORFs in single‐uORF transcripts. Repression is determined by calculating the difference between the mean TE of CDSs in uORF‐containing transcripts versus the mean TE of CDSs in transcripts lacking uORFs/oORFs in their TLS. Only expressed transcripts (> 0.5 RNA RPKM across samples per organism) were counted. Labels indicate the sample name. For more information on individual samples, see Table EV3.

Figure EV3
Figure EV3. Controls for RNA levels and TLS length

Scatterplots show the relationship between RNA level and CDS translation efficiency (zebrafish, 5 hpf) controlling for the number of uORFs by selecting (A) transcripts lacking uORFs (Pearson's r = 0.063, P = 7.4e‐5) and (B) transcripts containing a single uORF (Pearson's r = 0.102, P = 4.5e‐6).

TLS length and number of uORFs are well correlated in zebrafish (Pearson's r = 0.82).

Scatterplots show the relationship between TLS length and CDS translation efficiency (zebrafish, 5 hpf) controlling for the number of uORFs by selecting (D) transcripts lacking uORFs (Pearson's r = 0.011, P = 0.443) and (E) transcripts containing a single uORF (Pearson's r = 0.156, P < 2.2e‐16).

Figure EV4
Figure EV4. Additional factors that influence uORF repression in vertebrates (presented in zebrafish at 5 hpf)

Scatterplots present the effect of various uORF sequence features on translation in zebrafish at 5 hpf. (A) Points indicate the translation efficiency of CDSs in expressed, oORF‐containing transcripts which lack uORFs. The x‐axis indicates the relative position of the CDS AUG to the oORF stop codon, with more negative numbers indicating larger overlap. The amount of overlap is not significantly correlated with CDS TE (Pearson's r = 0.014, P = 0.720). (B, C) Points indicate the translation efficiency of CDSs in expressed, single‐uORF‐containing transcripts which lack oORFs, versus (B) intercistronic distance (Pearson's r = 0.147, P = 1.43e‐13) and (C) uORF size (Pearson's r = 0.026, P = 0.237). (D) uORF size is inversely correlated with uORF translation efficiency (Pearson's r = −0.31, P = 2.49e‐31).

Figure 4
Figure 4. uORF sequence features are targets of selection

uORF initiation contexts display signatures of selection. Plot displays cumulative distribution of AUG context scores calculated using nucleotide scoring matrices (Grzegorski et al, 2014) across multiple classes of ORFs (translated uORFs (high confidence), untranslated uORFs, CDS ORFs, and TLS background), with a higher score indicating better initiation context. TLS background represents the distribution of scores of a randomly sampled set of 50,000 sequences from zebrafish 5′ UTRs. Insets display sequence logos for CDSs, translated uORFs, and oORFs.

uORF initiation context influences repression of downstream translation. Plot displays cumulative distribution of translation efficiency in transcripts with single uORFs in favorable initiation contexts (top quintile of all uORFs) versus unfavorable contexts (bottom quintile of all uORFs). Inset displays where these quintiles lie on the distribution of all uORF AUG scores.

AUG frequency is lower proximal to the CDS start codon. Plot displays AUG frequency (as a fraction of all codons), split by frame relative to the CDS start codon. Points show frequencies at individual codon positions and loess regression lines display the overall trend.

Vertebrate TLSs contain fewer uORFs than expected. Histograms show the distribution of z‐scores in zebrafish, mouse, and human TLSs, with positive z‐scores indicating uORF enrichment and negative z‐scores indicating uORF depletion, relative to sequence‐shuffled TLSs.

uORFs are shorter than expected by chance. Histogram showing length distribution of all uORFs versus canonical protein‐coding regions, with inset providing a closer look at uORFs (bin size 10 nt). Vertical dotted lines indicate the observed mean length of endogeneous uORFs and the mean length of uORFs obtained by sequence shuffling of zebrafish TLSs, which differ significantly (two‐sided P < 4.5e‐308).

Figure EV5
Figure EV5. AUG context and translation

Cumulative plots show the distribution of initiation context scores in zebrafish for uORFs, oORFs, 3′ UTR ORFs (dORFs), and CDS, further broken down by their translation classification. The TLS background is calculated by scoring the nucleotide context around 50,000 randomly selected TLS positions.

uORF AUG scores are significantly correlated with uORF translation in single‐uORF transcripts at 5 hpf in zebrafish (Pearson's r = 0.30, P < 2.2e‐16).

uORF AUG scores are inversely correlated with downstream CDS translation in single‐uORF transcripts (Pearson's r = −0.094, P = 5.15e‐5).

Cumulative plots show the distribution of initiation context scores in (D) mouse and (E) human, for uORFs, oORFs, 3′ UTR ORFs (dORFs), and CDS. The TLS background is calculated by scoring the nucleotide context around 50,000 randomly selected TLS positions in each species.

Figure 5
Figure 5. uORFs and oORFs repress downstream reporter translation

Schematic displays uORF/oORF configurations for reporter experiments. GFP reporters contained variable uORF configurations: no uORFs, 1 uORF, 3 uORFs, 1 oORF, or 1 uORF in weak initiation context. TLS length (104 nt) and polyA tail length (60A) were constant across all reporters, and TLS sequence differed only by single nucleotide changes at each uORF start codon (or two single nucleotide changes in the weak context reporter). Constructs (100 pg) were coinjected with dsRed (150 pg) into 1‐cell‐stage embryos and quantified at 24 hpf.

Fluorescent microscopy images of representative embryos expressing each GFP reporter and the dsRed control 24 h post‐injection. uORFs and oORFs repress downstream translation as predicted by analysis of endogeneous transcripts. Repression is observed in reporters with uORF‐/oORF‐containing TLSs, but the effect is weaker for a uORF with a bad initiation context. Group pictures can be found in Appendix Fig S2.

Bar plot displays fluorescence quantification of 24‐h embryos injected with each reporter. GFP fluorescence intensity was normalized to dsRed intensity in each embryo with robust dsRed expression, then mean fluorescence for each reporter was scaled relative to the no‐uORF reporter (the number of embryos measured for each reporter is displayed below the x‐axis). Error bars display ± SEM. Reporter fluorescence was compared using unpaired two‐tailed Student's t‐test and was significant for all comparisons: **P < 0.01—no uORFs versus 1 weak context uORF (P = 2.98e‐3); ****P < 0.0001—no uORF versus 1 uORF (P = 3.21e‐9), 1 uORF versus 3 uORFs (P = 6.66e‐5), 1 uORF versus 1 oORF (P = 7.65e‐11), 1 uORF versus 1 uORF weak context (P = 3.24e‐5).

Figure 6
Figure 6. uORF regulatory activity is conserved across vertebrates

Cartoon shows strategy for investigating conservation of uORF regulatory activity. The ratio of translation between the TLS and CDS is calculated and compared between 1‐to‐1 homologs in the same tissue type across species.

uORF activity is correlated across species. Scatterplots display the translation ratio comparison between the same transcripts (TLS length > 100 nt, 1‐1 mouse–human homology) in fibroblasts (BJF cells in human: sample bjf2, MEF cells in mouse: sample mef2wt) (B) and brain (samples hbrainwt and brainwt) (D). The correlation between species is not due to CDS signal correlation. Scatterplots (C, E) display the translation ratio comparison between homologous transcripts (TLS length > 100 nt, 1‐1 mouse–human homology) in fibroblasts (B) and brain (D), maintaining CDS pairings while shuffling which TLS is associated with each CDS.

Comment in

Similar articles

See all similar articles

Cited by 61 articles

See all "Cited by" articles

Publication types

Substances

LinkOut - more resources

Feedback