Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria

PLoS Genet. 2011 Jun;7(6):e1002132. doi: 10.1371/journal.pgen.1002132. Epub 2011 Jun 16.

Abstract

Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Bacteria / genetics*
  • Bacterial Proteins / chemistry
  • DNA, Bacterial / genetics*
  • Evolution, Molecular*
  • Gene Frequency
  • Genome, Bacterial / genetics*
  • Inverted Repeat Sequences / genetics
  • Molecular Sequence Data
  • Multigene Family
  • Oligonucleotides / genetics
  • Repetitive Sequences, Nucleic Acid / genetics*
  • Replicon / genetics
  • Sequence Alignment

Substances

  • Bacterial Proteins
  • DNA, Bacterial
  • Oligonucleotides