Analysis of DNA repeats in bacterial plasmids reveals the potential for recurrent instability events

Appl Microbiol Biotechnol. 2010 Aug;87(6):2157-67. doi: 10.1007/s00253-010-2671-7. Epub 2010 May 23.


Structural instability has been frequently observed in natural plasmids and vectors used for protein expression or DNA vaccine development. However, there is a lack of information concerning hotspot mapping, namely, DNA repeats or sequences identical to the host genome. This led us to evaluate the abundance and distribution of direct, inverted, and tandem repeats with high recombination potential in 36 natural plasmids from ten bacterial genera, as well as in several widely used bacterial and mammalian expression vectors. In natural plasmids, we observed an overrepresentation of close direct repeats in comparison to inverted ones and a preferential location of repeats with high recombination potential in intergenic regions, suggesting a highly plastic and dynamic behavior. In plasmid vectors, we found a high density of repeats within eukaryotic promoters and non-coding sequences. As a result of this in silico analysis, we detected a spontaneous recombination between two 21-bp direct repeats present in the human cytomegalovirus early enhancer/promoter (huCMV EEP) of the pCIneo plasmid. This finding is of particular importance, as the huCMV EEP is one of the most frequently used regulatory elements in plasmid vectors. Because pDNA integration into host gDNA can have adverse consequences in terms of plasmid processing and host safety, we also mapped several regions with high probability to mediate integration into the Escherichia coli or human genomes. Like repeated regions, some of these were located in non-coding regions of the plasmids, thus being preferential targets to be removed.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics*
  • Base Sequence
  • Genome, Human
  • Genomic Instability*
  • Humans
  • Molecular Sequence Data
  • Plasmids / genetics*
  • Repetitive Sequences, Nucleic Acid*