Statistical evaluation and biological interpretation of non-random abundance in the E. coli K-12 genome of tetra- and pentanucleotide sequences related to VSP DNA mismatch repair

Nucleic Acids Res. 1992 Apr 11;20(7):1657-62. doi: 10.1093/nar/20.7.1657.

Abstract

The abundance of all tetra- and pentanucleotide sequences is calculated for a set of DNA sequence data comprising 767,393 nucleotides of the E. coli K-12 genome. Observed frequencies are compared to those expected from a Markov chain prediction algorithm. Systematic and extreme non-random representations are found for special sets of sequences. These are interpreted as arising from incorporation of a 2'-deoxyguanosine residue opposite thymidine during replication which, in special sequence contexts, leads to a T/G mismatch that is simultaneously substrate for two competing DNA mismatch repair systems: the mutHLS and the VSP pathway. Processing by the former leads to error correction, by the latter to mutation fixation. The significance of the latter process, as demonstrated here, makes it unlikely that VSP repair has evolved mainly as a mutation avoidance mechanism. It is proposed that in E. coli K-12, VSP repair, together with DNA cytosine methylation, constitutes a mutagenesis/recombination system capable of promoting gene-conversion-like unidirectional transfer of short stretches of DNA sequence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacillus subtilis / genetics
  • Base Composition
  • DNA Repair / genetics*
  • DNA Replication / genetics
  • DNA, Bacterial / chemistry
  • DNA, Bacterial / genetics*
  • DNA-Cytosine Methylases / genetics
  • Escherichia coli / genetics*
  • Genome, Bacterial
  • Markov Chains
  • Mutation / genetics
  • Repetitive Sequences, Nucleic Acid / genetics*

Substances

  • DNA, Bacterial
  • DNA-Cytosine Methylases