Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

Nucleic Acids Res. 2013 Jan 7;41(1):e17. doi: 10.1093/nar/gks721. Epub 2012 Sep 12.

Abstract

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of 'magnifying glass' effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Alu Elements
  • Chromosome Duplication
  • Chromosomes, Human, Pair 7 / chemistry
  • Chromosomes, Human, Y / chemistry
  • DNA / chemistry*
  • DNA, Satellite / chemistry
  • Genomics / methods
  • Humans
  • Repetitive Sequences, Nucleic Acid*
  • Sequence Analysis, DNA / methods*
  • Tandem Repeat Sequences

Substances

  • DNA, Satellite
  • DNA