Small RNAs (sRNAs) are 20-25 nt non-coding RNAs that act as guides for the highly sequence-specific regulatory mechanism known as RNA silencing. Due to the recent increase in sequencing depth, a highly complex and diverse population of sRNAs in both plants and animals has been revealed. However, the exponential increase in sequencing data has also made the identification of individual sRNA transcripts corresponding to biological units (sRNA loci) more challenging when based exclusively on the genomic location of the constituent sRNAs, hindering existing approaches to identify sRNA loci. To infer the location of significant biological units, we propose an approach for sRNA loci detection called CoLIde (Co-expression based sRNA Loci Identification) that combines genomic location with the analysis of other information such as variation in expression levels (expression pattern) and size class distribution. For CoLIde, we define a locus as a union of regions sharing the same pattern and located in close proximity on the genome. Biological relevance, detected through the analysis of size class distribution, is also calculated for each locus. CoLIde can be applied on ordered (e.g., time-dependent) or un-ordered (e.g., organ, mutant) series of samples both with or without biological/technical replicates. The method reliably identifies known types of loci and shows improved performance on sequencing data from both plants (e.g., A. thaliana, S. lycopersicum) and animals (e.g., D. melanogaster) when compared with existing locus detection techniques. CoLIde is available for use within the UEA Small RNA Workbench which can be downloaded from: http://srna-workbench.cmp.uea.ac.uk.
Keywords: expression level; high throughput sequencing; miRNA; microRNA; pattern; sRNA; sRNA loci; sRNAome; small RNA.