Word frequency analysis reveals enrichment of dinucleotide repeats on the human X chromosome and [GATA]n in the X escape region

Genome Res. 2006 Apr;16(4):477-84. doi: 10.1101/gr.4627606. Epub 2006 Mar 13.

Abstract

Most of the human genome encodes neither protein nor known functional RNA, yet available approaches to seek meaningful information in the "noncoding" sequence are limited. The unique biology of the X chromosome, one of which is silenced in mammalian females, can yield clues into sequence motifs involved in chromosome packaging and function. Although autosomal chromatin has some capacity for inactivation, evidence indicates that sequences enriched on the X chromosome render it fully competent for silencing, except in specific regions that escape inactivation. Here we have used a linguistic approach by analyzing the frequency and distribution of nine base-pair genomic "words" throughout the human genome. Results identify previously unknown sequence differences on the human X chromosome. Notably, the dinucleotide repeats [AT]n, [AC]n, and [AG]n are significantly enriched across the X chromosome compared with autosomes. Moreover, a striking enrichment (>10-fold) of [GATA]n is revealed throughout the 10-Mb segment at Xp22 that escapes inactivation, and is confirmed by fluorescence in situ hybridization. A similar enrichment is found in other eutherian genomes. Our findings clearly demonstrate sequence differences relevant to the novel biology and evolution of the X chromosome. Furthermore, they implicate simple sequence repeats, linked to gene regulation and unusual DNA structures, in the regulation and formation of facultative heterochromatin. Results suggest a new paradigm whereby a regional escape from X inactivation is due to the presence of elements that prevent heterochromatinization, rather than the lack of other elements that promote it.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Chromosome Mapping / methods
  • Chromosomes, Human, X / genetics*
  • Dinucleotide Repeats / genetics*
  • Evolution, Molecular*
  • Female
  • Gene Expression Regulation / genetics*
  • Genome, Human / genetics*
  • Heterochromatin / genetics
  • Humans
  • Male
  • X Chromosome Inactivation / genetics

Substances

  • Heterochromatin