Study of statistical correlations in DNA sequences

Gene. 2002 Oct 30;300(1-2):105-15. doi: 10.1016/s0378-1119(02)01037-5.

Abstract

Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence, appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA / genetics*
  • DNA, Bacterial / genetics
  • Genome, Bacterial
  • Genome, Human
  • Humans
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, DNA / statistics & numerical data*
  • Statistics as Topic

Substances

  • DNA, Bacterial
  • DNA