cWords - systematic microRNA regulatory motif discovery from mRNA expression data

Silence. 2013 May 20;4(1):2. doi: 10.1186/1758-907X-4-2.


Background: Post-transcriptional regulation of gene expression by small RNAs and RNA binding proteins is of fundamental importance in development of complex organisms, and dysregulation of regulatory RNAs can influence onset, progression and potentially be target for treatment of many diseases. Post-transcriptional regulation by small RNAs is mediated through partial complementary binding to messenger RNAs leaving nucleotide signatures or motifs throughout the entire transcriptome. Computational methods for discovery and analysis of sequence motifs in high-throughput mRNA expression profiling experiments are becoming increasingly important tools for the identification of post-transcriptional regulatory motifs and the inference of the regulators and their targets.

Results: cWords is a method designed for regulatory motif discovery in differential case-control mRNA expression datasets. We have improved the algorithms and statistical methods of cWords, resulting in at least a factor 100 speed gain over the previous implementation. On a benchmark dataset of 19 microRNA (miRNA) perturbation experiments cWords showed equal or better performance than two comparable methods, miReduce and Sylamer. We have developed rigorous motif clustering and visualization that accompany the cWords analysis for more intuitive and effective data interpretation. To demonstrate the versatility of cWords we show that it can also be used for identification of potential siRNA off-target binding. Moreover, cWords analysis of an experiment profiling mRNAs bound by Argonaute ribonucleoprotein particles discovered endogenous miRNA binding motifs.

Conclusions: cWords is an unbiased, flexible and easy-to-use tool designed for regulatory motif discovery in differential case-control mRNA expression datasets. cWords is based on rigorous statistical methods that demonstrate comparable or better performance than other existing methods. Rich visualization of results promotes intuitive and efficient interpretation of data. cWords is available as a stand-alone Open Source program at Github and as a web-service at: