Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;22(12):1808-1818.
doi: 10.1261/rna.053694.115. Epub 2016 Oct 19.

Exact Calculation of Loop Formation Probability Identifies Folding Motifs in RNA Secondary Structures

Affiliations
Free PMC article

Exact Calculation of Loop Formation Probability Identifies Folding Motifs in RNA Secondary Structures

Michael F Sloma et al. RNA. .
Free PMC article

Abstract

RNA secondary structure prediction is widely used to analyze RNA sequences. In an RNA partition function calculation, free energy nearest neighbor parameters are used in a dynamic programming algorithm to estimate statistical properties of the secondary structure ensemble. Previously, partition functions have largely been used to estimate the probability that a given pair of nucleotides form a base pair, the conditional stacking probability, the accessibility to binding of a continuous stretch of nucleotides, or a representative sample of RNA structures. Here it is demonstrated that an RNA partition function can also be used to calculate the exact probability of formation of hairpin loops, internal loops, bulge loops, or multibranch loops at a given position. This calculation can also be used to estimate the probability of formation of specific helices. Benchmarking on a set of RNA sequences with known secondary structures indicated that loops that were calculated to be more probable were more likely to be present in the known structure than less probable loops. Furthermore, highly probable loops are more likely to be in the known structure than the set of loops predicted in the lowest free energy structures.

Keywords: RNA folding thermodynamics; RNA secondary structure; coaxial stacking; partition function; stochastic sampling.

Figures

FIGURE 1.
FIGURE 1.
Accuracy of loop probability estimation using the exact calculation. The probabilities of all possible hairpin loops (far left), internal loops (middle left), and bulge loops (middle right), and all multibranch loops found in low free energy structures (far right) were calculated. Loops with probabilities greater than the specified threshold were compared to the true structure. In each panel, the PPV (top plot) and sensitivity (bottom plot) are plotted as a function of threshold value. The dotted line in each plot gives the PPV or sensitivity of a minimum free energy structure prediction, i.e., the accuracy of loops that are present in the predicted minimum free energy structure.
FIGURE 2.
FIGURE 2.
Variation in the accuracy of the probability calculation by family of structured RNA. For each family, PPV (top) and sensitivity (bottom) are shown for loops with calculated probabilities >40%. The 40% threshold is chosen arbitrarily, and variation is similar at other thresholds (Supplemental Table S1).
FIGURE 3.
FIGURE 3.
Accuracy of loop probability estimation using stochastic sampling. The frequencies of hairpin loops (A), internal loops (B), bulge loops (C), and multibranch loops (D) found in 1000 structures provided probability estimates, and loops with probabilities greater than a specified threshold were compared to the known structure. Here, the PPV (top) and sensitivity (bottom) are plotted as a function of threshold value. The dotted line in each plot gives the PPV or sensitivity of a minimum free energy structure prediction.
FIGURE 4.
FIGURE 4.
Accuracy of helix probability estimation. The probabilities of all possible helices containing 2–7 bp were calculated. Helices with probabilities greater than some threshold were compared to the true structure. Here, the PPV (top) and sensitivity (bottom) are plotted against threshold values at each length. The dotted line in each plot gives the PPV or sensitivity of a minimum free energy structure prediction.
FIGURE 5.
FIGURE 5.
Accuracy of multibranch loop prediction without the use of coaxial stacking nearest neighbor parameters. The probabilities of all multibranch loops found in low free energy structures were calculated with the use of coaxial stacking nearest neighbor parameters, and loops with probabilities greater than the threshold were compared to the true structure. Here, the PPV (top) and sensitivity (bottom) are plotted against threshold value. The dotted line in each plot gives the PPV or sensitivity of a minimum free energy structure prediction without coaxial stacking parameters.
FIGURE 6.
FIGURE 6.
The predicted minimum free energy structure of tRNA-arginine from Haloferax volcanii (Sprinzl et al. 1998), annotated with predicted probabilities for the loops and helices. An x across a base pair indicates an incorrectly predicted base pair, and the dashed line represents a true pair that is not in the predicted structure. Note that in the central multibranch loop, which is incorrectly predicted in the MFE structure, the calculated probability of the true loop is higher than that of the incorrectly predicted loop. In the structure calculations, modified nucleotides that cannot fit in A-form helices were forced to be unpaired (Mathews et al. 1999).
FIGURE 7.
FIGURE 7.
A diagram depicting the loop probability calculation for (A) hairpin loops; (B) helices, bulges, and internal loops; and (C) multibranch loops. For the region of the RNA containing the loop or helix, shown in gray, the structure is known, and there is an equilibrium constant K for the region in the nearest neighbor parameters. For the regions with unknown structure, shown in black, the partition function for the region can be found in the V table from the partition function calculation. The known region is “frozen in place” while the rest of the structure varies, so all secondary structures containing the loop or helix are implicitly accounted for.

Similar articles

See all similar articles

Cited by 7 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback