STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time

Bioinformatics. 2006 Jul 1;22(13):1593-9. doi: 10.1093/bioinformatics/btl142. Epub 2006 Apr 13.

Abstract

Motivation: Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem.

Results: Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Pairing
  • Base Sequence
  • Computational Biology / methods*
  • Models, Statistical
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • Phylogeny
  • Probability
  • RNA / chemistry*
  • RNA, Untranslated / chemistry*
  • Sequence Alignment
  • Software
  • Time

Substances

  • RNA, Untranslated
  • RNA