Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun;29(6):932-943.
doi: 10.1101/gr.239822.118. Epub 2019 May 31.

Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations

Affiliations

Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations

Éléonore Durand et al. Genome Res. 2019 Jun.

Abstract

Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in laboratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene-like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of iORF annotation and translation detection procedure. For a more complete description, see Methods and Supplemental Figure S1. iORF annotation was conducted using S. paradoxus strains that are structured in three main lineages (SpA, SpB, and SpC) with S. cerevisiae as an outgroup. Pairs of genes annotated as syntenic were used to align intergenic genomic regions in which iORFs were characterized. The age of an iORF was estimated using reconstructions of ancestral intergenic sequences at nodes N1 and N2 (in red) to infer their emergence along phylogenetic branches (named b1 to b4, in gray). We chose four strains (one per S. paradoxus lineage and one S. cerevisiae) to characterize the repertoire of translated iORFs (tORFs) using ribosome profiling. iORFs without translation signature were named ntORFs.
Figure 2.
Figure 2.
A fraction of the iORFs display translation signatures similar to genes. (A) Distribution of the ribosome profiling read counts for genes (gray) and iORFs (purple) at the start codon position. (B) Number of genes (Gen) or iORFs with a detected initiation peak at the start codon position. Peaks are colored according to the precision of the detection (see Methods), from the most precise (p3) to the least precise (p1). Genes and iORFs with no peaks detected are shown in green (p0). (C) Distribution of the ribosome profiling read counts in the first 51 nt of iORFs, excluding the start codon. (D) Proportions of genes or iORFs with a significant in-frame codon periodicity (read phasing in blue) among genes and iORFs with a detected initiation peak. Genes and iORFs with no detected phasing are shown in green. (E) Metagene analysis for significantly high (HE; left) or low (LE; middle) translated genes (gray) and for intergenic tORFs (purple; right). The mean of the 5′ read counts is plotted along the position relative to the start codon for significantly translated genes or tORFs. The lines of the matrix indicate the normalized coverage of genes or tORFs with significant translation signatures, with one feature per line. (AE) Results for the SpC strain MSH587-1 are shown (for SpA and SpB results, see Supplemental Fig. S3). (F,G) Number of genes or iORFs without (ntORFs; F) or with (tORFs; G) translation signatures detected in at least one of the four strains. Actual numbers are indicated next to each bar. iORFs are classified according to their age (N2, N1, or Term; see Methods) (Fig. 1; Table 1).
Figure 3.
Figure 3.
Putative intergenic polypeptides are less efficiently translated compared with genes. (AC) Ribosome profiling (RPF start), total RNA (Total RNA start), and translation efficiency (TE start)—read counts in the first 60 nt, normalized to correct for library size differences in log2—are displayed for genes (Gen) and tORFs depending on their ages (N2, N1, and Term). Significant differences in pairwise comparisons are displayed above each plot: Wilcoxon test; (***) P-values <0.001, (**) P-values <0.01, and (*) P-values <0.05. Mean estimates per size range are colored in shades of green (from pale for low values to dark green for high values). tORF and gene numbers per size range and age are indicated below the graph. (D) RPF plotted as a function of total RNA for tORFs in purple or for genes in gray. Regression lines are plotted for significant Spearman correlations (P-values <0.05). Expression levels were calculated using the mean of the two replicates.
Figure 4.
Figure 4.
Age-dependent characteristics of intergenic polypeptides. (AE) Sizes (log2 number of residues), mean disorder (ISD), GC%, SNP density, and distance to the closest gene are displayed for genes and tORFs as a function of their age (N2, N1, and Term). Pairwise significant differences are displayed above each plot: Wilcoxon test; (***) P-values <0.001, (**) P-values <0.01, and (*) P-values <0.05. Mean estimates per size range are colored in shades of green (from pale for low values to dark green high values). (F) Principal component analysis using the number of residues (SIZE in log2), ribosome profiling (RPF), total RNA (TOT) and TE (as read counts in the first 60 nt normalized to correct for library size differences and in log2), intrinsic disorder (ISD), the GC%, and SNP density (SNP). tORFs are colored as a function of their age. (G) Percentage of variance explained by each PCA axis (the two first axes explain 33% and 20% of the variation for a total of 53%).
Figure 5.
Figure 5.
A continuous emergence of putative polypeptides in S. paradoxus. Normalized RPF read coverage for a selection of lineage-specific (or group-specific) tORFs per strain. RPF read coverages are displayed for replicate 1 and 2 with a blue or pink area, respectively. The positions of all iORFs (including ntORFs and tORFs) in the genomic area are drawn below each plot. The tORF of interest is labeled with a yellow dot and is plotted in black. iORFs overlapping the iORF of interest are plotted in black when they are in the same reading frame and in gray when they are in a different reading frame than the selected tORF.
Figure 6.
Figure 6.
DHFR tagging confirms expression of tORFs. (A) Conceptual figure of the approach. Forty-five tORFs were tagged with a full-length Dhfr—in-frame or out-of-frame in SpA, SpB, and SpC—and then phenotyped by time-resolved imaging and spot-dilution assays. (B) Log2 colony sizes of strains tagged with Dhfr in-frame (y-axis) or out-of-frame (x-axis). The colony size is measured after ∼60 h of growth (shown as a red vertical line in panel A) on medium supplemented with methotrexate. Colors represent the different strains. Canonical genes are tagged in the CTRL strains (SpC strain). Dashed line indicates y = x. (C) Spot-dilution assays further confirm expression of the tORFs and show differential expression of tORF_153359, tORF_159125, and tORF_162702. Fivefold dilutions go from top to bottom. For the corresponding controls in medium not supplemented with methotrexate, see Supplemental Figure S9.

Similar articles

Cited by

References

    1. Agier N, Fischer G. 2012. The mutational profile of the yeast genome is shaped by replication. Mol Biol Evol 29: 905–913. 10.1093/molbev/msr280 - DOI - PubMed
    1. Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. 2009. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25: 1968–1969. 10.1093/bioinformatics/btp347 - DOI - PMC - PubMed
    1. Bataillon T, Bailey SF. 2014. Effects of new mutations on fitness: insights from models and data. Ann N Y Acad Sci 1320: 76–92. 10.1111/nyas.12460 - DOI - PMC - PubMed
    1. Baudin-Baillieu A, Hatin I, Legendre R, Namy O. 2016. Translation analysis at the genome scale by ribosome profiling. Methods Mol Biol 1361: 105–124. 10.1007/978-1-4939-3079-1_7 - DOI - PubMed
    1. Begun DJ, Lindfors HA, Thompson ME, Holloway AK. 2006. Recently evolved genes identified from Drosophila yakuba and D. erecta accessory gland expressed sequence tags. Genetics 172: 1675–1681. 10.1534/genetics.105.050336 - DOI - PMC - PubMed

Publication types