Finding a common motif of RNA sequences using genetic programming: the GeRNAMo system

IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):596-610. doi: 10.1109/tcbb.2007.1045.

Abstract

We focus on finding a consensus motif of a set of homologous or functionally related RNA molecules. Recent approaches to this problem have been limited to simple motifs, require sequence alignment, and make prior assumptions concerning the data set. We use genetic programming to predict RNA consensus motifs based solely on the data set. Our system -- dubbed GeRNAMo (Genetic programming of RNA Motifs) -- predicts the most common motifs without sequence alignment and is capable of dealing with any motif size. Our program only requires the maximum number of stems in the motif, and if prior knowledge is available the user can specify other attributes of the motif (e.g., the range of the motif's minimum and maximum sizes), thereby increasing both sensitivity and speed. We describe several experiments using either ferritin iron response element (IRE); signal recognition particle (SRP); or microRNA sequences, showing that the most common motif is found repeatedly, and that our system offers substantial advantages over previous methods.

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Animals
  • Base Sequence
  • Computational Biology / methods*
  • Evolution, Molecular
  • Ferritins / chemistry
  • Humans
  • MicroRNAs / chemistry
  • Models, Genetic
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • RNA / chemistry*
  • Sequence Alignment
  • Sequence Analysis, RNA

Substances

  • MicroRNAs
  • RNA
  • Ferritins