GraphClust: alignment-free structural clustering of local RNA secondary structures

Bioinformatics. 2012 Jun 15;28(12):i224-32. doi: 10.1093/bioinformatics/bts224.

Abstract

Motivation: Clustering according to sequence-structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs.

Results: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence-structure methods although achieving speedups of several orders of magnitude. A selection of applications aiming at the detection of novel structural ncRNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements.

Availability: The GraphClust pipeline is available on request.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Cluster Analysis
  • Computational Biology / methods*
  • Humans
  • Models, Theoretical
  • Nucleic Acid Conformation*
  • Nucleotide Motifs
  • RNA, Long Noncoding / chemistry*
  • Sequence Alignment
  • Sequence Analysis, RNA / methods*

Substances

  • RNA, Long Noncoding