ColorHOR--novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome

Bioinformatics. 2005 Apr 1;21(7):846-52. doi: 10.1093/bioinformatics/bti072. Epub 2004 Oct 27.

Abstract

Motivation: GenBank data are at present lacking alpha satellite higher-order repeat (HOR) annotation. Furthermore, exact HOR consensus lengths have not been reported so far. Given the fast growth of sequence databases in the centromeric region, it is of increasing interest to have efficient tools for computational identification and analysis of HORs from known sequences.

Results: We develop a graphical user interface method, ColorHOR, for fast computational identification of HORs in a given genomic sequence, without requiring a priori information on the composition of the genomic sequence. ColorHOR is based on an extension of the key-string algorithm and provides a color representation of the order and orientation of HORs. For the key string, we use a robust 6 bp string from a consensus alpha satellite and its representative nature is tested. ColorHOR algorithm provides a direct visual identification of HORs (direct and/or reverse complement). In more detail, we first illustrate the ColorHOR results for human chromosome 1. Using ColorHOR we determine for the first time the HOR annotation of the GenBank sequence of the whole human genome. In addition to some HORs, corresponding to those determined previously biochemically, we find new HORs in chromosomes 4, 8, 9, 10, 11 and 19. For the first time, we determine exact consensus lengths of HORs in 10 chromosomes. We propose that the HOR assignment obtained by using ColorHOR be included into the GenBank database.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • Chromosomes, Human, Pair 1 / genetics*
  • Color
  • Computer Graphics*
  • Consensus Sequence / genetics
  • DNA, Satellite / analysis
  • DNA, Satellite / genetics*
  • Database Management Systems
  • Databases, Nucleic Acid*
  • Genome, Human
  • Humans
  • Information Storage and Retrieval / methods
  • Microsatellite Repeats / genetics
  • Sequence Analysis, DNA / methods*
  • Software
  • User-Computer Interface*

Substances

  • DNA, Satellite