Learning cis-regulatory principles of ADAR-based RNA editing from CRISPR-mediated mutagenesis

Xin Liu; Tao Sun; Anna Shcherbina; Qin Li; Inga Jarmoskaite; Kalli Kappel; Gokul Ramaswami; Rhiju Das; Anshul Kundaje; Jin Billy Li

doi:10.1038/s41467-021-22489-2

Learning cis-regulatory principles of ADAR-based RNA editing from CRISPR-mediated mutagenesis

Nat Commun. 2021 Apr 12;12(1):2165. doi: 10.1038/s41467-021-22489-2.

Authors

Xin Liu^#¹, Tao Sun^#¹, Anna Shcherbina^#², Qin Li¹, Inga Jarmoskaite¹, Kalli Kappel³, Gokul Ramaswami¹, Rhiju Das^{4

5}, Anshul Kundaje^{6

7}, Jin Billy Li⁸

Affiliations

¹ Department of Genetics, Stanford University, Stanford, CA, USA.
² Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
³ Biophysics Program, Stanford University, Stanford, CA, USA.
⁴ Department of Biochemistry, Stanford University, Stanford, CA, USA.
⁵ Department of Physics, Stanford University, Stanford, CA, USA.
⁶ Department of Genetics, Stanford University, Stanford, CA, USA. akundaje@stanford.edu.
⁷ Department of Computer Science, Stanford University, Stanford, CA, USA. akundaje@stanford.edu.
⁸ Department of Genetics, Stanford University, Stanford, CA, USA. jin.billy.li@stanford.edu.

^# Contributed equally.

Abstract

Adenosine-to-inosine (A-to-I) RNA editing catalyzed by ADAR enzymes occurs in double-stranded RNAs. Despite a compelling need towards predictive understanding of natural and engineered editing events, how the RNA sequence and structure determine the editing efficiency and specificity (i.e., cis-regulation) is poorly understood. We apply a CRISPR/Cas9-mediated saturation mutagenesis approach to generate libraries of mutations near three natural editing substrates at their endogenous genomic loci. We use machine learning to integrate diverse RNA sequence and structure features to model editing levels measured by deep sequencing. We confirm known features and identify new features important for RNA editing. Training and testing XGBoost algorithm within the same substrate yield models that explain 68 to 86 percent of substrate-specific variation in editing levels. However, the models do not generalize across substrates, suggesting complex and context-dependent regulation patterns. Our integrative approach can be applied to larger scale experiments towards deciphering the RNA editing code.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Adenosine Deaminase / metabolism*
Algorithms
Base Sequence
CRISPR-Associated Protein 9 / metabolism
Clustered Regularly Interspaced Short Palindromic Repeats / genetics*
HEK293 Cells
Humans
Machine Learning
Models, Genetic
Mutagenesis / genetics*
Mutation / genetics
Nucleic Acid Conformation
RNA / chemistry
RNA / genetics
RNA Editing / genetics*
Regulatory Sequences, Nucleic Acid / genetics*
Substrate Specificity

Substances

RNA
CRISPR-Associated Protein 9
Adenosine Deaminase

Abstract

Publication types

MeSH terms

Substances

Grants and funding