Key-string segmentation algorithm and higher-order repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7

J Theor Biol. 2003 Mar 7;221(1):29-37. doi: 10.1006/jtbi.2003.3165.

Abstract

A new key-string segmentation algorithm for identification of alpha satellite DNAs and higher-order repeat (HOR) units was introduced and exemplified. Starting with an initial key string, we determine the dominant key string and HOR. Our key-string algorithm was used to scan the recent GenBank data for human alpha satellite DNA sequence AC017075.8 (193 277 bp) from the centromeric region of chromosome 7. The sequence was computationally segmented into one HOR domain (super-repeat domain) and two non-HOR domains. Dominant key-string GTTTCT provided segmentation in terms of alpha monomers. The HOR is tandemly repeated in 54 copies in the super-repeat (HOR) domain. Five insertions and three deletions in the HOR structure associated with a dominant key string were identified. Concensus HOR was constructed. Divergence of individual HOR copies from concensus amounts to 0.7% on the average, while divergence between 16 monomer variants within each HOR is on the average 20%. In the front and back domain, 199 monomer variants were identified that are not organized in HOR and diverge by 20-40%.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosomes, Human, Pair 7 / genetics*
  • Computational Biology / methods*
  • DNA, Satellite / genetics*
  • Databases, Nucleic Acid
  • Humans
  • Molecular Sequence Data
  • Repetitive Sequences, Nucleic Acid / genetics*

Substances

  • DNA, Satellite