Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting

BMC Evol Biol. 2015 Dec 18:15:283. doi: 10.1186/s12862-015-0558-z.


Background: Gene duplication is believed to be the classical way to form novel genes, but overprinting may be an important alternative. Overprinting allows entirely novel proteins to evolve de novo, i.e., formerly non-coding open reading frames within functional genes become expressed. Only three cases have been described for Escherichia coli. Here, a fourth example is presented.

Results: RNA sequencing revealed an open reading frame weakly transcribed in cow dung, coding for 101 residues and embedded completely in the -2 reading frame of citC in enterohemorrhagic E. coli. This gene is designated novel overlapping gene, nog1. The promoter region fused to gfp exhibits specific activities and 5' rapid amplification of cDNA ends indicated the transcriptional start 40-bp upstream of the start codon. nog1 was strand-specifically arrested in translation by a nonsense mutation silent in citC. This Nog1-mutant showed a phenotype in competitive growth against wild type in the presence of MgCl2. Small differences in metabolite concentrations were also found. Bioinformatic analyses propose Nog1 to be inner membrane-bound and to possess at least one membrane-spanning domain. A phylogenetic analysis suggests that the orphan gene nog1 arose by overprinting after Escherichia/Shigella separated from the other γ-proteobacteria.

Conclusions: Since nog1 is of recent origin, non-essential, short, weakly expressed and only marginally involved in E. coli's central metabolism, we propose that this gene is in an initial stage of evolution. While we present specific experimental evidence for the existence of a fourth overlapping gene in enterohemorrhagic E. coli, we believe that this may be an initial finding only and overlapping genes in bacteria may be more common than is currently assumed by microbiologists.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Bacterial Proteins / genetics
  • Base Sequence
  • Cattle
  • Codon, Initiator
  • Computational Biology
  • Enteropathogenic Escherichia coli / classification
  • Enteropathogenic Escherichia coli / genetics*
  • Enteropathogenic Escherichia coli / growth & development
  • Evolution, Molecular*
  • Feces / microbiology
  • Genes, Overlapping
  • Molecular Sequence Data
  • Open Reading Frames
  • Operon
  • Phylogeny
  • Promoter Regions, Genetic
  • Shigella / genetics


  • Bacterial Proteins
  • Codon, Initiator