AliNA - a deep learning program for RNA secondary structure prediction

Mol Inform. 2023 Dec;42(12):e202300113. doi: 10.1002/minf.202300113. Epub 2023 Nov 2.

Abstract

Nowadays there are numerous discovered natural RNA variations participating in different cellular processes and artificial RNA, e. g., aptamers, riboswitches. One of the required tasks in the investigation of their functions and mechanism of influence on cells and interaction with targets is the prediction of RNA secondary structures. The classic thermodynamic-based prediction algorithms do not consider the specificity of biological folding and deep learning methods that were designed to resolve this issue suffer from homology-based methods problems. Herein, we present a method for RNA secondary structure prediction based on deep learning - AliNA (ALIgned Nucleic Acids). Our method successfully predicts secondary structures for non-homologous to train-data RNA families thanks to usage of the data augmentation techniques. Augmentation extends existing datasets with easily-accessible simulated data. The proposed method shows a high quality of prediction across different benchmarks including pseudoknots. The method is available on GitHub for free (https://github.com/Arty40m/AliNA).

Keywords: RNA; data augmentation; deep learning; pseudoknots; secondary structure; structure prediction.

MeSH terms

  • Algorithms
  • Deep Learning*
  • Humans
  • Nucleic Acid Conformation
  • RNA* / chemistry
  • RNA* / genetics
  • Sequence Analysis, RNA / methods

Substances

  • RNA