Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Nov 1;28(21):2738-46.
doi: 10.1093/bioinformatics/bts519. Epub 2012 Aug 24.

RIsearch: Fast RNA-RNA Interaction Search Using a Simplified Nearest-Neighbor Energy Model

Affiliations
Free PMC article

RIsearch: Fast RNA-RNA Interaction Search Using a Simplified Nearest-Neighbor Energy Model

Anne Wenzel et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Regulatory, non-coding RNAs often function by forming a duplex with other RNAs. It is therefore of interest to predict putative RNA-RNA duplexes in silico on a genome-wide scale. Current computational methods for predicting these interactions range from fast complementary-based searches to those that take intramolecular binding into account. Together these methods constitute a trade-off between speed and accuracy, while leaving room for improvement within the context of genome-wide screens. A fast pre-filtering of putative duplexes would therefore be desirable.

Results: We present RIsearch, an implementation of a simplified Turner energy model for fast computation of hybridization, which significantly reduces runtime while maintaining accuracy. Its time complexity for sequences of lengths m and n is with a much smaller pre-factor than other tools. We show that this energy model is an accurate approximation of the full energy model for near-complementary RNA-RNA duplexes. RIsearch uses a Smith-Waterman-like algorithm using a dinucleotide scoring matrix which approximates the Turner nearest-neighbor energies. We show in benchmarks that we achieve a speed improvement of at least 2.4× compared with RNAplex, the currently fastest method for searching near-complementary regions. RIsearch shows a prediction accuracy similar to RNAplex on two datasets of known bacterial short RNA (sRNA)-messenger RNA (mRNA) and eukaryotic microRNA (miRNA)-mRNA interactions. Using RIsearch as a pre-filter in genome-wide screens reduces the number of binding site candidates reported by miRNA target prediction programs, such as TargetScanS and miRanda, by up to 70%. Likewise, substantial filtering was performed on bacterial RNA-RNA interaction data.

Availability: The source code for RIsearch is available at: http://rth.dk/resources/risearch.

Figures

Fig. 1.
Fig. 1.
The flow in the three-state model. This state model has been (developed for and) proven useful before in the pairwise alignment of amino acid sequences using doublets, hereby taking into account correlation of neighboring residues (Akbasli, 2008). Here, the dashes indicate bulges or asymmetric internal loops, but are equivalent to gaps when the state model is applied to sequence alignments. States are represented by circles, transitions by connecting arcs. The number of pairs in the circles indicate the index increments to reach that state, e.g. for the Bq-state (bulge in query) only the q(uery) index is incremented, thus ‘(+1,0)’, while the M-state is reached by (mis)matching two residues, so both indices are updated. In a DP matrix, this corresponds to moving diagonally for transitioning into the M-state, and horizontally/vertically for the B-states. Indices along the t(arget) sequence are decremented as the two interacting RNA strands run in opposite directions. See recursion in the text for further description
Fig. 2.
Fig. 2.
Approximated loop energies. In red, energies as given by Turner 2004 parameters. In blue, the linear approximation used in RIsearch. Values for small loops are given as box plots (RIsearch to the right). (a) Bulge loops: the affine gap model is exact for bulge sizes 2–6, and over-penalizes larger loops. (b) Interior loops: here symmetric case only, for asymmetric loops a penalty is added. Furthermore, parameters for AU/GU closure and terminal mismatch are applied where required in both schemes. Small symmetric internal loops (1 × 1 and 2 × 2) have tabulated free energy changes, here shown as box plots. Next to that, RIsearch approximations are plotted, including the aforementioned parameters
Fig. 3.
Fig. 3.
Accuracy on simulated data. Data shown here for length = 50 nt, GC-content = 50%. (a) Average of all computed MFEs given a certain LD as reported by the different tools. (b) Correlation of MFE values as returned by DuplexFold versus RIsearch. (c) Overlap of helices in the top 5% ranking predictions. (d) Relative difference in reported energies, computed as |(DuplexFold–RIsearch)/DuplexFold|. The boxes represent the interquartile range (IQR), from the first quartile to the third quartile, the band inside denotes the median. The whiskers extend to the most extreme data points within 1.5 IQR from the box. Outliers are shown as circles
Fig. 4.
Fig. 4.
RIsearch as filter for bacterial sRNA–mRNA interactions. The color key refers to RIsearch energy cutoffs. TNR (or specificity) is synonymous with the search space reduction we can achieve with different cutoffs. Recall (or sensitivity, TPR) shows how many of the known interactions we retain

Similar articles

See all similar articles

Cited by 33 articles

See all "Cited by" articles

References

    1. Akbasli E. Fast sequence alignment in a managed programming language. 2008 MSc Thesis, IT University of Copenhagen.
    1. Alkan C, et al. RNA–RNA interaction prediction and antisense RNA target search. J. Comput. Biol. 2006;13:267–282. - PubMed
    1. Amaral PP, et al. The eukaryotic genome as an RNA machine. Science. 2008;319:1787–1789. - PubMed
    1. Andronescu M, et al. Secondary structure prediction of interacting RNA molecules. J. Mol. Biol. 2005;345:987–1001. - PubMed
    1. Barron N, et al. MicroRNAs: tiny targets for engineering CHO cell phenotypes? Biotechnol. Lett. 2011;33:11–21. - PubMed

Publication types

MeSH terms

Feedback