Background: MicroRNAs (miRNAs) are a large family of 21-22 nucleotide non-coding RNAs with presumed post-transcriptional regulatory activity. Most miRNAs were identified by direct cloning of small RNAs, an approach that favors detection of abundant miRNAs. Three observations suggested that miRNA genes might be identified using a computational approach. First, miRNAs generally derive from precursor transcripts of 70-100 nucleotides with extended stem-loop structure. Second, miRNAs are usually highly conserved between the genomes of related species. Third, miRNAs display a characteristic pattern of evolutionary divergence.
Results: We developed an informatic procedure called 'miRseeker', which analyzed the completed euchromatic sequences of Drosophila melanogaster and D. pseudoobscura for conserved sequences that adopt an extended stem-loop structure and display a pattern of nucleotide divergence characteristic of known miRNAs. The sensitivity of this computational procedure was demonstrated by the presence of 75% (18/24) of previously identified Drosophila miRNAs within the top 124 candidates. In total, we identified 48 novel miRNA candidates that were strongly conserved in more distant insect, nematode, or vertebrate genomes. We verified expression for a total of 24 novel miRNA genes, including 20 of 27 candidates conserved in a third species and 4 of 11 high-scoring, Drosophila-specific candidates. Our analyses lead us to estimate that drosophilid genomes contain around 110 miRNA genes.
Conclusions: Our computational strategy succeeded in identifying bona fide miRNA genes and suggests that miRNAs constitute nearly 1% of predicted protein-coding genes in Drosophila, a percentage similar to the percentage of miRNAs recently attributed to other metazoan genomes.