Atypical regions in large genomic DNA sequences

Proc Natl Acad Sci U S A. 1994 Jul 19;91(15):7134-8. doi: 10.1073/pnas.91.15.7134.

Abstract

Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. We describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of > 1000 nt and human sequences of > 10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. We consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacteriophage lambda / genetics
  • Base Sequence
  • DNA / analysis
  • DNA / genetics*
  • Genetic Variation
  • Globins / genetics
  • Humans
  • Information Systems
  • Markov Chains
  • Saccharomyces / genetics
  • Sequence Analysis, DNA / methods*

Substances

  • Globins
  • DNA