Motivation: MicroRNAs (miRNAs) are a class of endogenes derived from a precursor (pre-miRNA) and involved in post-transcriptional regulation. Experimental identification of novel miRNAs is difficult because they are often transcribed under specific conditions and cell types. Several computational methods were developed to detect new miRNAs starting from known ones or from deep sequencing data, and to validate their pre-miRNAs.
Results: We present a genome-wide search algorithm, called MIReNA, that looks for miRNA sequences by exploring a multidimensional space defined by only five (physical and combinatorial) parameters characterizing acceptable pre-miRNAs. MIReNA validates pre-miRNAs with high sensitivity and specificity, and detects new miRNAs by homology from known miRNAs or from deep sequencing data. A performance comparison between MIReNA and four available predictive systems has been done. MIReNA approach is strikingly simple but it turns out to be powerful at least as much as more sophisticated algorithmic methods. MIReNA obtains better results than three known algorithms that validate pre-miRNAs. It demonstrates that machine-learning is not a necessary algorithmic approach for pre-miRNAs computational validation. In particular, machine learning algorithms can only confirm pre-miRNAs that look alike known ones, this being a limitation while exploring species with no known pre-miRNAs. The possibility to adapt the search to specific species, possibly characterized by specific properties of their miRNAs and pre-miRNAs, is a major feature of MIReNA. A parameter adjustment calibrates specificity and sensitivity in MIReNA, a key feature for predictive systems, which is not present in machine learning approaches. Comparison of MIReNA with miRDeep using deep sequencing data to predict miRNAs highlights a highly specific predictive power of MIReNA.
Availability: At the address http://www.ihes.fr/carbone/data8/.