Background: The use of exogenous small interfering RNAs (siRNAs) for gene silencing has quickly become a widespread molecular tool providing a powerful means for gene functional study and new drug target identification. Although considerable progress has been made recently in understanding how the RNAi pathway mediates gene silencing, the design of potent siRNAs remains challenging.
Results: We propose a simple linear model combining basic features of siRNA sequences for siRNA efficacy prediction. Trained and tested on a large dataset of siRNA sequences made recently available, it performs as well as more complex state-of-the-art models in terms of potency prediction accuracy, with the advantage of being directly interpretable. The analysis of this linear model allows us to detect and quantify the effect of nucleotide preferences at particular positions, including previously known and new observations. We also detect and quantify a strong propensity of potent siRNAs to contain short asymmetric motifs in their sequence, and show that, surprisingly, these motifs alone contain at least as much relevant information for potency prediction as the nucleotide preferences for particular positions.
Conclusion: The model proposed for prediction of siRNA potency is as accurate as a state-of-the-art nonlinear model and is easily interpretable in terms of biological features. It is freely available on the web at http://cbio.ensmp.fr/dsir.