A computational prediction of isochores based on hidden Markov models

Gene. 2006 Dec 30;385:41-9. doi: 10.1016/j.gene.2006.04.032. Epub 2006 Aug 17.

Abstract

Mammalian genomes are organised into a mosaic of regions (in general more than 300 kb in length), with differing, relatively homogeneous G+C contents. The G+C content is the basic characteristic of isochores, but they have also been associated with many other biological properties. For instance, the genes are more compact and their density is highest in G+C rich isochores. Various ways of locating isochores in the human genome have been developed, but such methods use only the base composition of the DNA sequences. The present paper proposes a new method, based on a hidden Markov model, which takes into account several of the biological properties associated with the isochore structure of a genome. This method leads to good segmentation of the human genome into isochores, and also permits a new analysis of the known heterogeneity of G+C rich isochores: most (60%) of the G+C poor genes embedded in G+C rich isochores have UTR sequences characteristic of G+C rich genes. This genomic feature is discussed in the context of both evolution and genome function.

MeSH terms

  • 5' Untranslated Regions
  • Algorithms
  • Chromosome Mapping
  • GC Rich Sequence
  • Genome, Human
  • Humans
  • Isochores / genetics*
  • Markov Chains
  • Models, Genetic*

Substances

  • 5' Untranslated Regions
  • Isochores