In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway "microRNAs in cancer" when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.
Keywords: Sponge phenomena; healthy/unhealthy tissues classification; non-coding RNA.