Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 2;45(10):6168-6176.
doi: 10.1093/nar/gkx170.

A Sensitivity Analysis of RNA Folding Nearest Neighbor Parameters Identifies a Subset of Free Energy Parameters With the Greatest Impact on RNA Secondary Structure Prediction

Affiliations
Free PMC article

A Sensitivity Analysis of RNA Folding Nearest Neighbor Parameters Identifies a Subset of Free Energy Parameters With the Greatest Impact on RNA Secondary Structure Prediction

Jeffrey Zuber et al. Nucleic Acids Res. .
Free PMC article

Abstract

Nearest neighbor parameters for estimating the folding energy changes of RNA secondary structures are used in structure prediction and analysis. Despite their widespread application, a comprehensive analysis of the impact of each parameter on the precision of calculations had not been conducted. To identify the parameters with greatest impact, a sensitivity analysis was performed on the 291 parameters that compose the 2004 version of the free energy nearest neighbor rules. Perturbed parameter sets were generated by perturbing each parameter independently. Then the effect of each individual parameter change on predicted base-pair probabilities and secondary structures as compared to the standard parameter set was observed for a set of sequences including structured ncRNA, mRNA and randomized sequences. The results identify for the first time the parameters with the greatest impact on secondary structure prediction, and the subset which should be prioritized for further study in order to improve the precision of structure prediction. In particular, bulge loop initiation, multibranch loop initiation, AU/GU internal loop closure and AU/GU helix end parameters were particularly important. An analysis of parameter usage during folding free energy calculations of stochastic samples of secondary structures revealed a correlation between parameter usage and impact on structure prediction precision.

Figures

Figure 1.
Figure 1.
Sensitivity analysis. In each panel, independent parameters are along the x-axis, organized by motif type and with a key below the plot. (A) Mean base pair probability RMSD for the entire sequence archive except randomized sequences for ±3 standard errors. The RMSDs for +3 standard errors are shown above the x-axis, while the RMSDs for −3 standard error are shown below the x-axis. (B) The sensitivity analysis using flat errors across all parameters. The analysis was performed as in Figure 1A, except a σ value of 0.5 kcal/mol was used for each parameter instead of using the experimentally determined errors. (C) The counts of parameter use. Use counts for each parameter were tabulated for folding free energy calculations for secondary structures sampled from the Boltzmann ensamble. This measurement was performed for all sequences. The counts for the dependent parameters were attributed to the independent parameters on which the dependent parameters depend.
Figure 2.
Figure 2.
Parameter usage counts correlate with RMSD. The log10 of RMSD as a function of the log10 of the thermodynamic parameter usage count for calculating folding energies of a stochastic sample across all sequences. RMSD was calculated using a flat error estimate of +3σ (1.5 kcal/mol). A best fit line is shown and the linear correlation coefficient, R2, is 0.8983.
Figure 3.
Figure 3.
The sensitivity of base pairing probability to parameter change is a function of the probability of the pair. (A) The mean absolute value of change in pairing probability plotted as a function of pairing probability. The change in each base pair probability in the entire sequence archive was averaged over every independent parameter change of −3 standard errors. The changes were then averaged for every pair probability bin. (B) A plot of the pair probability distribution. Shown is a histogram of the reference base pair probabilities. Note that ∼98% of the pair probabilities have a value <1%; the y-axis was limited to 50 000 counts per bin (the number of counts for the 0–1% bin is 17.44 million and the number of counts in the 1–2% bin is 69 865).
Figure 4.
Figure 4.
The sensitivity to parameter changes is family dependent. The scatter plots show the sensitivity defect from changing a parameter by +3 standard errors for specific RNA families as a function of the average for all sequences (where the average is the mean of the per family RMSDs). Therefore, this plot has one point per family for each of the independent parameters. If the sensitivity defect for a parameter for an individual RNA family is identical to the average across all families, it would fall on the diagonal line (shown in black). The mRNA and shuffled RNA sequences experience a greater sensitivity defect than the average (their points are generally above the diagonal line), while 5S rRNAs and tRNAs have a lower sensitivity defect than the average (the points generally fall below the line).

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles

References

    1. Wu L., Belasco J.G. Let me count the ways: mechanisms of gene regulation by miRNAs and siRNAs. Mol. Cell. 2008; 29:1–7. - PubMed
    1. Doudna J.A., Cech T.R. The chemical repertoire of natural ribozymes. Nature. 2002; 418:222–228. - PubMed
    1. Serganov A., Nudler E. A decade of riboswitches. Cell. 2013; 152:17–24. - PMC - PubMed
    1. Yu Y.T., Meier U.T. RNA-guided isomerization of uridine to pseudouridine–pseudouridylation. RNA Biol. 2014; 11:1483–1494. - PMC - PubMed
    1. Tinoco I. Jr, Bustamante C. How RNA folds. J. Mol. Biol. 1999; 293:271–281. - PubMed
Feedback