Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica

J Bacteriol. 2013 Jun;195(12):2786-92. doi: 10.1128/JB.02285-12. Epub 2013 Apr 12.


Multiple sequencing of genomes belonging to a bacterial species allows one to analyze and compare statistics and dynamics of the gene complements of species, their pan-genomes. Here, we analyzed multiple genomes of Escherichia coli, Shigella spp., and Salmonella enterica. We demonstrate that the distribution of the number of genomes harboring a gene is well approximated by a sum of two power functions, describing frequent genes (present in many strains) and rare genes (present in few strains). The virtual absence of Shigella-specific genes not present in E. coli genomes confirms previous observations that Shigella is not an independent genus. While the pan-genome size is increasing with each new strain, the number of genes present in a fixed fraction of strains stabilizes quickly. For instance, slightly fewer than 4,000 genes are present in at least half of any group of E. coli genomes. Comparison of S. enterica and E. coli pan-genomes revealed the existence of a common periphery, that is, genes present in some but not all strains of both species. Analysis of phylogenetic trees demonstrates that rare genes from the periphery likely evolve under horizontal transfer, whereas frequent periphery genes may have been inherited from the periphery genome of the common ancestor.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Escherichia coli / genetics*
  • Evolution, Molecular*
  • Genome, Bacterial*
  • Phylogeny
  • Salmonella enterica / genetics*
  • Shigella / genetics*