Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 5;1(1):e2.
doi: 10.1093/nargab/lqz002. eCollection 2019 Apr.

Conserved regions in long non-coding RNAs contain abundant translation and protein-RNA interaction signatures

Affiliations

Conserved regions in long non-coding RNAs contain abundant translation and protein-RNA interaction signatures

Jorge Ruiz-Orera et al. NAR Genom Bioinform. .

Abstract

The mammalian transcriptome includes thousands of transcripts that do not correspond to annotated protein-coding genes and that are known as long non-coding RNAs (lncRNAs). A handful of lncRNAs have well-characterized regulatory functions but the biological significance of the majority of them is not well understood. LncRNAs that are conserved between mice and humans are likely to be enriched in functional sequences. Here, we investigate the presence of different types of ribosome profiling signatures in lncRNAs and how they relate to sequence conservation. We find that lncRNA-conserved regions contain three times more ORFs with translation evidence than non-conserved ones, and identify nine cases that display significant sequence constraints at the amino acid sequence level. The study also reveals that conserved regions in intergenic lncRNAs are significantly enriched in protein-RNA interaction signatures when compared to non-conserved ones; this includes sites in well-characterized lncRNAs, such as Cyrano, Malat1, Neat1 and Meg3, as well as in tens of lncRNAs of unknown function. This work illustrates how the analysis of ribosome profiling data coupled with evolutionary analysis provides new opportunities to explore the lncRNA functional landscape.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Transcriptome-wide identification of conserved sequences, promoters and Ribo-Seq associations. (A) Fraction of mouse genes that showed conservation in humans using BLASTN (Conservation), that overlapped with annotated promoter regions (Promoter), or that were covered by Ribo-Seq reads (Ribo-Seq). The percentage of genes with at least one feature, and the total sequence covered, is indicated. Data are for expressed codRNAs and lncRNAs in the hippocampus (sequences with a minimum RNA-Seq coverage of 56.38 reads/kb). (B) Analysis of feature coverage in equally-sized fractions of the genes, from 5′ (p1) to 3′ (p5). Grey bars represent the mean proportion of a shuffled control where the different features per gene were randomly shuffled along the sequence 1000 times. Error bars represent the standard error of the proportion.
Figure 2.
Figure 2.
Effect of conservation across lncRNA types. (A) Number and fraction of different categories based on position and sequence features. Antisense: Exonic overlap, expression on a bidirectional promoter, and/or annotated as antisense; ncRNA host: Genes with at least one small RNA sequence found in the exonic region; intergenic: rest of the genes. Conserved genes are enriched in antisense and ncRNA host genes. (B) Percentage of total sequence that is covered by Ribo-Seq reads (1 or more reads), and annotated promoter cores, for conserved and non-conserved regions in codRNAs and lncRNAs. Conserved lncRNA regions showed a significantly higher proportion of all features compared to not conserved regions or expected randomly (test of equal proportions; *** P-value < 10−5). Error bars represent the standard error of the proportion. Categories: A: Antisense; I: Intergenic; H: ncRNA host.
Figure 3.
Figure 3.
Identification of translated open reading frames and ribonucleoproteins. (A) Workflow to identify translated open reading frames (ORFs), putative functional proteins and ribonucleoproteins (RNPs). Ribosome profiling (Ribo-Seq) reads are mapped to candidate gene regions and ORFs with a RibORF score ≥ 0.7 are defined as translated. Rest of regions with Rfoot uniformity score < 0.6 and FLOSS score ≥ 0.35 are defined as RNPs. Next, human ORF syntenic regions are extracted with LiftOver and aligned with PRANK, when possible. Truncated alignments are those for which >50% of the ORF was aligned, or the gap limit is exceeded (33% or 10-nt). Finally, non-truncated alignments are checked for purifying selection signatures with Codeml to identify putative constrained peptides or proteins (dN/dS ratio < 1; Chi-square test of dN/dS ratio, P-value < 0.05). (B) Fraction and number of conserved and not conserved codRNAs and lncRNAs that contain at least one translated open reading frame (ORF), ribonucleprotein (RNP), both features (ORF+RNP), or neither of the two features. (C) Percentage of total sequence that is covered by translated ORFs and RNPs, for conserved and non-conserved regions. Overall, about 14.1% of the total conserved region in lncRNAs contained ORFs predicted to be translated (122 ORFs), compared to 5.65% for non-conserved regions (370 ORFs). Test of equal proportions: * P-value < 0.05; *** P-value < 10−5. Error bars represent the standard error of the proportion. Categories: A: Antisense; I: Intergenic; H: ncRNA host. (D) Example of a functionally characterized lncRNA, Cyrano, with RNA-Seq, Ribo-Seq and annotated CLIP-Seq peaks (RBFOX and CELF4). Predicted conserved regions (CONS), ORFs and RNPs are also displayed. There is a high agreement between CLIP-Seq peaks and Ribo-Seq RNPs. * location of a previously described miRNA-binding site.
Figure 4.
Figure 4.
LncRNAs have more heterogenous Ribo-Seq read lengths. (A) Fraction of sequence covered by Ribo-Seq that contains reads from a specific length for conserved and not conserved regions in different categories of lncRNAs. While antisense lncRNAs resemble codRNAs in the read distribution, intergenic and ncRNA host regions contain a higher proportion of short and long reads corresponding to non-ribosome associates. (B) Ribo-Seq read density for regions predicted as ribonucleoproteins (RNP), translated sequences (ORF) and other regions covered by Ribo-Seq. ORFs in codRNAs have a higher read density than the rest of the sequences (***. Wilcoxon test, P-value < 10−5).

Similar articles

Cited by

References

    1. Carninci P., Kasukawa T., Katayama S., Gough J., Frith M.C., Maeda N., Oyama R., Ravasi T., Lenhard B., Wells C. et al. .. The transcriptional landscape of the mammalian genome. Science. 2005; 309:1559–1563. - PubMed
    1. Kapranov P., Cheng J., Dike S., Nix D.A., Duttagupta R., Willingham A.T., Stadler P.F., Hertel J., Hackermüller J., Hofacker I.L. et al. .. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007; 316:1484–1488. - PubMed
    1. Okazaki Y., Furuno M., Kasukawa T., Adachi J., Bono H., Kondo S., Nikaido I., Osato N., Saito R., Suzuki H. et al. .. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002; 420:563–573. - PubMed
    1. Ponjavic J., Ponting C.P., Lunter G.. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007; 17:556–565. - PMC - PubMed
    1. Cabili M.N., Trapnell C., Goff L., Koziol M., Tazon-Vega B., Regev A., Rinn J.L.. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011; 25:1915–1927. - PMC - PubMed