Improved free energy parameters for RNA pseudoknotted secondary structure prediction

RNA. 2010 Jan;16(1):26-42. doi: 10.1261/rna.1689910. Epub 2009 Nov 20.

Abstract

Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks-Pierce (DP) and the Cao-Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Sequence
  • Computational Biology / methods*
  • Energy Metabolism / physiology
  • Forecasting / methods
  • Models, Genetic
  • Molecular Sequence Data
  • Nucleic Acid Conformation*
  • RNA / analysis
  • RNA / chemistry*
  • Sensitivity and Specificity
  • Sequence Analysis, RNA
  • Software
  • Thermodynamics

Substances

  • RNA