Learning cis-regulatory principles of ADAR-based RNA editing from CRISPR-mediated mutagenesis

Nat Commun. 2021 Apr 12;12(1):2165. doi: 10.1038/s41467-021-22489-2.

Abstract

Adenosine-to-inosine (A-to-I) RNA editing catalyzed by ADAR enzymes occurs in double-stranded RNAs. Despite a compelling need towards predictive understanding of natural and engineered editing events, how the RNA sequence and structure determine the editing efficiency and specificity (i.e., cis-regulation) is poorly understood. We apply a CRISPR/Cas9-mediated saturation mutagenesis approach to generate libraries of mutations near three natural editing substrates at their endogenous genomic loci. We use machine learning to integrate diverse RNA sequence and structure features to model editing levels measured by deep sequencing. We confirm known features and identify new features important for RNA editing. Training and testing XGBoost algorithm within the same substrate yield models that explain 68 to 86 percent of substrate-specific variation in editing levels. However, the models do not generalize across substrates, suggesting complex and context-dependent regulation patterns. Our integrative approach can be applied to larger scale experiments towards deciphering the RNA editing code.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine Deaminase / metabolism*
  • Algorithms
  • Base Sequence
  • CRISPR-Associated Protein 9 / metabolism
  • Clustered Regularly Interspaced Short Palindromic Repeats / genetics*
  • HEK293 Cells
  • Humans
  • Machine Learning
  • Models, Genetic
  • Mutagenesis / genetics*
  • Mutation / genetics
  • Nucleic Acid Conformation
  • RNA / chemistry
  • RNA / genetics
  • RNA Editing / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Substrate Specificity

Substances

  • RNA
  • CRISPR-Associated Protein 9
  • Adenosine Deaminase