Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec;16(12):2304-18.
doi: 10.1261/rna.1950510. Epub 2010 Oct 12.

Computational Approaches for RNA Energy Parameter Estimation

Affiliations
Free PMC article

Computational Approaches for RNA Energy Parameter Estimation

Mirela Andronescu et al. RNA. .
Free PMC article

Abstract

Methods for efficient and accurate prediction of RNA structure are increasingly valuable, given the current rapid advances in understanding the diverse functions of RNA molecules in the cell. To enhance the accuracy of secondary structure predictions, we developed and refined optimization techniques for the estimation of energy parameters. We build on two previous approaches to RNA free-energy parameter estimation: (1) the Constraint Generation (CG) method, which iteratively generates constraints that enforce known structures to have energies lower than other structures for the same molecule; and (2) the Boltzmann Likelihood (BL) method, which infers a set of RNA free-energy parameters that maximize the conditional likelihood of a set of reference RNA structures. Here, we extend these approaches in two main ways: We propose (1) a max-margin extension of CG, and (2) a novel linear Gaussian Bayesian network that models feature relationships, which effectively makes use of sparse data by sharing statistical strength between parameters. We obtain significant improvements in the accuracy of RNA minimum free-energy pseudoknot-free secondary structure prediction when measured on a comprehensive set of 2518 RNA molecules with reference structures. Our parameters can be used in conjunction with software that predicts RNA secondary structures, RNA hybridization, or ensembles of structures. Our data, software, results, and parameter sets in various formats are freely available at http://www.cs.ubc.ca/labs/beta/Projects/RNA-Params.

Figures

FIGURE 1.
FIGURE 1.
Secondary structure of a Vimentin 3′ UTR protein-binding RNA region from the Rfam database (Rfam family RF00109, S76850.1/1539-1604). (A) The various types of loops in a pseudoknot-free secondary structure are indicated: stacked pair (two adjacent base pairs stacking onto each other), hairpin loop (HL—a region of unpaired bases closed by a base pair), internal loop (IL—two regions of unpaired bases closed by two base pairs), bulge loop (an internal loop with no unpaired bases on one side), multiloop (three or more stems connected together), and external loop. (B) Marked are examples of features of the Turner model and the corresponding Turner parameter values (Mathews et al. 1999a).
FIGURE 2.
FIGURE 2.
Example of relationship graph for one 1 × 1 internal loop. This internal loop is closed by two A-U base pairs, has one U-U mismatch, and the sequence is of type 5′RYY/RYY3′, where R is a purine (A or G) and Y is a pyrimidine (C or U). Therefore, it is connected with the features A-U closure (with unnormalized weight 2, or normalized weight 2/4), U-U mismatch (with unnormalized weight 1) and the corresponding purine-pyrimidine group (with un-normalized weight 1).
FIGURE 3.
FIGURE 3.
Sensitivity and positive predictive value (PPV) of several parameter sets when measured on S-STRAND2. The points and training sets used for each point are described in Table 3. CONTRAfold uses a parameter γ to set the tradeoff between the sensitivity and PPV (we used values from 1 to 20).
FIGURE 4.
FIGURE 4.
Correlation in prediction accuracy (F-measure) per molecule between our best parameters BL-FR* and the Turner99 parameters, on all the long and short structures in the S-STRAND2 set. (A) Structures of lengths 2000–4000 nucleotides; the correlation coefficient is 0.72. (B) Structures of lengths 0 to 200 nucleotides; the correlation coefficient is 0.59.

Similar articles

See all similar articles

Cited by 38 articles

See all "Cited by" articles

Publication types

Feedback