In-depth characterization and identification of translatable lncRNAs

Comput Biol Med. 2023 Sep:164:107243. doi: 10.1016/j.compbiomed.2023.107243. Epub 2023 Jul 8.

Abstract

Long non-coding RNAs (LncRNAs) are non-protein coding transcripts more than 200 nucleotides in length. Deep sequencing technologies have unveiled lncRNAs can harbor translatable short open reading frames (sORFs). Yet the regulatory mechanisms governing lncRNA translation events remain poorly understood. Here, we exhaustively detected the sequence, functional element, and structure features relevant to lncRNA translation in human. Extensive identification and analysis reveal that translatable lncRNAs contain richer protein-coding related sequence features, cap-dependent and cap-independent translation initiation mechanisms, and more stable secondary structures, as compared to untranslatable lncRNAs. These findings strongly support lncRNAs serve as a repository for the production of new small peptides. Based on the feature fusion affecting translation and the extreme gradient boosting (XGBoost) algorithm, we developed the first computational tool that dedicated for predicting translatable lncRNAs, named TransLncPred. Benchmark experimental results show that our method outperforms several state-of-the-art RNA coding potential prediction tools on the same training and testing datasets. The 100-time 10-fold cross-validation tests also demonstrate that regulatory element-derived features, especially N7-methylguanosine (m7G) and internal ribosome entry site (IRES), contribute to the improvement in predictive performance.

Keywords: Long non-coding RNA; Regulatory element; Secondary structure; Small peptides; sORFs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Humans
  • Open Reading Frames
  • RNA, Long Noncoding* / genetics

Substances

  • RNA, Long Noncoding