Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov;24(11):1555-1567.
doi: 10.1261/rna.066324.118. Epub 2018 Aug 10.

Accelerated RNA Secondary Structure Design Using Preselected Sequences for Helices and Loops

Affiliations
Free PMC article

Accelerated RNA Secondary Structure Design Using Preselected Sequences for Helices and Loops

Stanislav Bellaousov et al. RNA. .
Free PMC article

Abstract

Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.

Keywords: RNA folding thermodynamics; RNA partition function; RNA sequence design; ensemble defect.

Figures

FIGURE 1.
FIGURE 1.
Trends in natural sequences. Distributions of folding free energy change, ensemble folding free energy change, ensemble defect, and structure probability for 7 bp helices. Cumulative distribution plots are provided in red for unique sequences observed in the database of RNA structures and in blue for all possible helices. Panel A is Gibbs free energy change, panel B is ensemble Gibbs free energy change, panel C is ensemble defect, and panel D is the probability of helix formation.
FIGURE 2.
FIGURE 2.
Algorithm performance distribution for sets of parameters. Blue and red dots represent performance of Design algorithm in Preselected and Random modes, respectively. Green shows the performance of the Design algorithm in Preselected mode using all adenines in place of single stranded regions. Each dot is the mean performance for a single set of parameters. Black outlines show the parameter sets that are used for further performance evaluation on a different set of structures. Performance is evaluated as mean time as function of mean NED.
FIGURE 3.
FIGURE 3.
Time performance for long sequences. Designs were made for sequences of up to 1995 nt (Supplemental Table S7). Mean time performance is shown for ten calculations for each target structure. Design times were capped at 75 d of running time (6,480,000 sec). Points missing for NUPACK (697, 1793, and 1995 nt) and Design_Random (1793 and 1995 nt) had one or more designs that reached the maximum runtime and were terminated, so the mean could not be calculated. Design_Random and Design_Preselected were run with default parameters. NUPACK was run using rna99 thermodynamic parameters at 37°C, the NED threshold was set to 0.1 so that NUPACK produced structures of similar NED as Design_Random, and other parameters were set to defaults. Designs were performed on a single core of an Opteron 2427 processor.
FIGURE 4.
FIGURE 4.
Generating the databases of RNA helices and loops. A list of all possible helices of lengths from 3 to 10 bp, composed of only canonical A-U and G-C base pairs, was trimmed by removing helices with more than three consecutive base pair repeats, by removing helices with strands that form intramolecular pairs, and by removing helices with high ensemble defect. The list was trimmed further by removing helices with high propensity to cross-hybridize with other helices in the list. To generate a list of sequences to use as loops, a list of all possible sequences of lengths from 3 to 10 nt was trimmed by removing sequences with more than three consecutive repeating nucleotides, and by removing sequences that can form intramolecular pairs. The list was trimmed further by removing sequences with high propensity of forming pairs with the final helix list and by removing sequences that can form self-complementary duplexes. The list was trimmed further by removing sequences with high propensity to cross-hybridize.
FIGURE 5.
FIGURE 5.
Hierarchical structure decomposition. Black arrows show the structure decomposition into branches and leafs. Gray arrows show the merging of leafs or branches. Dotted line shows the location of structure decomposition.

Similar articles

See all similar articles

Cited by 1 article

Publication types

LinkOut - more resources

Feedback