RNAtranslator: Modeling protein-conditional RNA design as sequence-to-sequence natural language translation

PLoS Comput Biol. 2025 Oct 3;21(10):e1013541. doi: 10.1371/journal.pcbi.1013541. eCollection 2025 Oct.

Abstract

Protein-RNA interactions are essential in gene regulation, splicing, RNA stability, and translation, making RNA a promising therapeutic agent for targeting proteins, including those considered undruggable. However, designing RNA sequences that selectively bind to proteins remains a significant challenge due to the vast sequence space and limitations of current experimental and computational methods. Traditional approaches rely on in vitro selection techniques or computational models that require post-generation optimization, restricting their applicability to well-characterized proteins. We introduce RNAtranslator, a generative language model that formulates protein-conditional RNA design as a sequence-to-sequence natural language translation problem for the first time. By learning a joint representation of RNA and protein interactions from large-scale datasets, RNAtranslator directly generates binding RNA sequences for any given protein target without the need for additional optimization. Our results demonstrate that RNAtranslator produces RNA sequences with natural-like properties, high novelty, and enhanced binding affinity compared to existing methods. This approach enables efficient RNA design for a wide range of proteins and even proteins with no RNA-interaction data available, paving the way for new RNA-based therapeutics and synthetic biology applications.

MeSH terms

  • Base Sequence
  • Computational Biology / methods
  • Humans
  • Protein Binding
  • Protein Biosynthesis
  • RNA* / chemistry
  • RNA* / genetics
  • RNA* / metabolism
  • RNA-Binding Proteins* / chemistry
  • RNA-Binding Proteins* / genetics
  • RNA-Binding Proteins* / metabolism

Substances

  • RNA
  • RNA-Binding Proteins