MicroRNA prediction with a novel ranking algorithm based on random walks

Bioinformatics. 2008 Jul 1;24(13):i50-8. doi: 10.1093/bioinformatics/btn175.

Abstract

MicroRNA (miRNAs) play essential roles in post-transcriptional gene regulation in animals and plants. Several existing computational approaches have been developed to complement experimental methods in discovery of miRNAs that express restrictively in specific environmental conditions or cell types. These computational methods require a sufficient number of characterized miRNAs as training samples, and rely on genome annotation to reduce the number of predicted putative miRNAs. However, most sequenced genomes have not been well annotated and many of them have a very few experimentally characterized miRNAs. As a result, the existing methods are not effective or even feasible for identifying miRNAs in these genomes. Aiming at identifying miRNAs from genomes with a few known miRNA and/or little annotation, we propose and develop a novel miRNA prediction method, miRank, based on our new random walks- based ranking algorithm. We first tested our method on Homo sapiens genome; using a very few known human miRNAs as samples, our method achieved a prediction accuracy greater than 95%. We then applied our method to predict 200 miRNAs in Anopheles gambiae, which is the most important vector of malaria in Africa. Our further study showed that 78 out of the 200 putative miRNA precursors encode mature miRNAs that are conserved in at least one other animal species. These conserved putative miRNAs are good candidates for further experimental study to understand malaria infection.

Availability: MiRank is programmed in Matlab on Windows platform. The source code is available upon request.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Data Interpretation, Statistical
  • MicroRNAs / genetics*
  • Models, Genetic
  • Models, Statistical
  • Molecular Sequence Data
  • Sequence Alignment / methods*
  • Sequence Analysis, RNA / methods*

Substances

  • MicroRNAs