Structured RNAs and RNA complexes underlie biological processes ranging from control of gene expression to protein translation. Approximately 50% of nucleotides within known structured RNAs are folded into Watson-Crick (WC) base pairs, and sequence changes that preserve these pairs are typically assumed to preserve higher-order RNA structure and binding of macromolecule partners. Here, we report that indirect effects of the helix sequence on RNA tertiary stability are, in fact, significant but are nevertheless predictable from a simple computational model called RNAMake-∆∆G. When tested through the RNA on a massively parallel array (RNA-MaP) experimental platform, blind predictions for >1500 variants of the tectoRNA heterodimer model system achieve high accuracy (rmsd 0.34 and 0.77 kcal/mol for sequence and length changes, respectively). Detailed comparison of predictions to experiments support a microscopic picture of how helix sequence changes subtly modulate conformational fluctuations at each base-pair step, which accumulate to impact RNA tertiary structure stability. Our study reveals a previously overlooked phenomenon in RNA structure formation and provides a framework of computation and experiment for understanding helix conformational preferences and their impact across biological RNA and RNA-protein assemblies.
RNA energetics; blind prediction; high-throughput data; indirect readout.
Conflict of interest statement
Conflict of interest statement: D.H. and R.D. have coauthored manuscripts with H.M.A.-H. in the past 2 years.
Free energy of tectoRNA binding depends on helix sequence. (
A) Structure of tectoRNA homodimer [Protein Data Bank (PDB): 2ADT] with 2 tertiary contacts (GAAA-11nt). One of these tertiary contacts is replaced (GGAA-R1; blue) to convert the complex to the heterodimer used in this study (32). On the right is the sequence and secondary structure of the wild-type tectoRNA interaction. Numbers indicate the “position” within the chip-piece helix. ( B) In our experimental setup, one piece of the heterodimer was fluorescently labeled and free in solution (the “flow piece”), while the other was immobilized on the surface of a sequencing chip (chip piece). Quantification of the bound flow piece to the chip surface allowed determination of the free energy of binding (ΔG) to form the bound tectoRNA. ( C) Free energy of binding of the flow piece to 7 distinct chip-piece variants. Error bars are 95% CI on the measured ΔG. The sequence of the flow- and chip-piece helices is indicated ( Bottom).
Ensemble model for RNA helices allows prediction of tectoRNA assembly energetics. (
A, Left) The modeled structure of the unconstrained tectoRNA (i.e., with one contact formed) is shown. The global structure was assembled from the structures of its constituent elements, including the base-pair steps that compose the helical regions. ( A, Center) Example base-pair steps are shown for the chip-piece helix. Each base-pair step can adopt an ensemble of many possible conformations, which were derived from examples of that base-pair step in the crystallographic database. ( A, Right) Example conformations within the UC/GA conformational ensemble are shown. ( B) Starting with the unconstrained tectoRNA as shown in A, a Monte Carlo simulation was performed. At each step of the simulation, the structure of one base-pair step in the tectoRNA was replaced with a new state from its conformational ensemble. The new structure of the unconstrained tectoRNA assembly was evaluated for whether it was “bound” or “unbound,” according to the translational and rotational distances from the target base pair to the final base pair. One million steps were performed, and the total number of computed bound and unbound tectoRNA conformations were used to calculate the free energy change between the bound and the unbound tectoRNAs (ΔG conf).
RNAMake-ΔΔG accounts for changes in tectoRNA affinity in a blind prediction challenge. (
A) Blind predictions generated with the RNAMake-ΔΔG model agree well with observed values of tectoRNA binding ΔG for 1,536 chip-piece variants ( R 2 = 0.71). Each set of ΔG values is compared with their respective medians to obtain ΔΔGs. The red dashed line indicates the best-fit line (slope = 0.54); the gray dotted line indicates the line of slope 1. Inset shows the measured binding affinity curves of 2 chip-piece variants. ( B) Example 3D trajectories of the chip-piece helix produced during the Monte Carlo sampling for 2 variants whose binding curves are shown in A). For each variant, 250 unbound trajectories (light gray) and 100 bound trajectories are shown (dark gray). All trajectories are aligned by the top base pair. The traces are through the center of each base pair in the helix. ( C) Distribution of the terminating base pair of the chip and flow pieces in the partially bound tectoRNA projected on the x- y plane. Distributions were determined using bivariate kernel density estimate smoothing of ∼1,000 bound or partially bound structures sampled from the simulation. The centroids of the distributions are shown as open circles; the black lines connect the centroid of the partially bound structures to the centroid of the bound structures (black dot). ( D) Observed ( Left) and predicted ( Right) affinities for chip-piece helices with the indicated base pair at each position within the helix. Affinities are given as the deviation from the median observed or predicted affinity across all 1,536 variants.
Base-pair conformations differ by position within the helix. (
A) Change in sampling frequency of conformational states in the AU/AU ensemble in the bound versus the partially bound. ( B) Example structures of base-pair step conformations that are enriched and depleted at 2 positions. ( C) Change in positioning between enriched and depleted conformational states at each position of any base-pair type. (see for other coordinates). Enriched = sampling frequency more than 2-fold greater than expected, and depleted = sampling frequency less than 2-fold less than expected. SI Appendix, Fig. S8
Increased prediction accuracy of different length pairs with refinement of the bound state cutoff. (
A) Observed versus calculated affinities for chip- and flow-piece variants with altered lengths. The colors indicate the length of the flow- and chip-piece helices. ( B) Distribution of the value for Euler angle γ within 2 bound tectoRNA complexes, where γ represents the rotation between the final bp and the target bp around the z axis. The 11-bp chip-piece variant has distinct values for γ compared with the 10-bp chip-piece variant. The vertical dashed line indicates γ = −10°. ( C) Structure of the bound complex with the original cutoff (light blue) or a more stringent cutoff (blue) where γ has to be > −10°. ( D) Observed and calculated affinities for length-pair combinations with the more stringent cutoff which excluded overtwisted conformations (i.e., γ > −10° in the bound complex). The colors indicate the length of the flow- and chip-piece helices as in A; observed values are the same as in A. ( D) Observed versus predicted (blind prediction values) of a new set of chip-piece sequences against a distinct 10-bp flow piece using either the original model (open circles) or the updated model with the more stringent cutoff (closed circles).
Prediction of RNA double helix distortions that occur during ribosomal A-site accommodation and amino acid charging. (
A) When complexed with EF-Tu and being loaded into the A-site of the ribosome (the A/T state), Thermus thermophilus tRNA Thr appears bent (cyan, PDB: 4V5G) compared with Escheria coli tRNA Phe only complexed with EF-Tu (red, PDB: 1OB2); ( B) Overlay of the target fully A/T-bound configuration of the anticodon helix (cyan) and example RNAMake-modeled configuration (gray); Inset shows how scoring occurs between the target base pair from the bound tRNA and the last base pair in the RNAMake built model. ( C) The secondary structure of tRNA and the location of the anticodon helix and acceptor helix (boxed). ( D) Predicted dependence of A/T-tRNA Thr binding free energy on the sequence of the anticodon helix with the indicated base pair at each position within the helix. Additional heat maps from independently solved structures give indistinguishable sequence dependences ( ). RNAMake-calculations were performed over all 4 SI Appendix, Fig. S14 5 anticodon helix sequences ( Dataset S4). Rigorous tests of the RNAMake predictions will require high-precision presteady-state or single molecule measurements that isolate the binding equilibrium of EF-Tu-bound tRNA into the A/T state. ( E) tRNA asp from either E. coli (cyan, 1C0A) or yeast (green, 1IL2) form similar conformations when bound to E. coli aspartyl-tRNA synthetase (AspRS). This conformation is bent at the acceptor helix compared with a structure of a partially bound yeast tRNA asp that does not make contact to the synthetase at its acceptor end and was cocrystallized with the bound conformation (red, 1IL2). ( F) Overlay of the target fully bound configuration (green) and example RNAMake-modeled configuration (gray); the inset shows how scoring occurs between the target base pair from the bound tRNA and the last base pair in the RNAMake built model. ( G– I) Predicted dependence of tRNA-AspRS binding free energy on the acceptor stem sequence with the indicated base pair at each position within the helix. RNAMake calculations were performed over all 4 7 acceptor helix sequences ( Dataset S3). While the predicted effects are small in magnitude, calculations with target-bound conformations drawn from ( G) E. coli tRNA/ E. coli AspRS (1C0A) and ( H) yeast tRNA/ E. coli AspRS (1IL2) give similar predicted preferences with slight differences arising from the slightly different sequences and AspRS-bound structures taken by the 2 tRNAs in nucleotides outside the acceptor stem. The sequence preference map for ( F) binding of yeast tRNA asp to the yeast aspartyl-tRNA synthetase (1ASZ) is quite distinct. Reference binding free energies for ΔΔG are based on RNAMake calculations with the E. coli tRNA asp sequence ( G and H) and the yeast tRNA asp sequence ( I). Note that the scale of effects (0.2 kcal/mol or less) is smaller than the differences in enzymatic rates (1 to 2 kcal/mol) for the few tRNA combinations reported in refs. and 47, suggesting that effects beyond conformational bending account for those results, such as the differences in chemical modification or processing in tRNAs prepared in vivo. Rigorous tests of the RNAMake predictions will require high-precision thermodynamic measurements using in vitro prepared tRNA substrates.
Blind tests of RNA nearest-neighbor energy prediction.
Proc Natl Acad Sci U S A. 2016 Jul 26;113(30):8430-5. doi: 10.1073/pnas.1523335113. Epub 2016 Jul 8.
Proc Natl Acad Sci U S A. 2016.
27402765 Free PMC article.
Sequence dependence of the stability of RNA hairpin molecules with six nucleotide loops.
Biochemistry. 2006 Feb 7;45(5):1400-7. doi: 10.1021/bi051750u.
Parallel-stranded DNA and RNA duplexes - structural features and potential applications.
FEBS J. 2017 Dec;284(23):3986-3998. doi: 10.1111/febs.14187. Epub 2017 Aug 22.
FEBS J. 2017.
The energetics of small internal loops in RNA.
Biopolymers. 1999-2000;52(4):157-67. doi: 10.1002/1097-0282(1999)52:4<157::AID-BIP1001>3.0.CO;2-E.
Salt Dependence of A-Form RNA Duplexes: Structures and Implications.
J Phys Chem B. 2019 Nov 21;123(46):9773-9785. doi: 10.1021/acs.jpcb.9b07502. Epub 2019 Nov 11.
J Phys Chem B. 2019.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Nucleic Acid Conformation*