Comparative Analyses of Selection Operating on Nontranslated Intergenic Regions of Diverse Bacterial Species

Genetics. 2017 May;206(1):363-376. doi: 10.1534/genetics.116.195784. Epub 2017 Mar 9.


Nontranslated intergenic regions (IGRs) compose 10-15% of bacterial genomes, and contain many regulatory elements with key functions. Despite this, there are few systematic studies on the strength and direction of selection operating on IGRs in bacteria using whole-genome sequence data sets. Here we exploit representative whole-genome data sets from six diverse bacterial species: Staphylococcus aureus, Streptococcus pneumoniae, Mycobacterium tuberculosis, Salmonella enterica, Klebsiella pneumoniae, and Escherichia coli We compare patterns of selection operating on IGRs using two independent methods: the proportion of singleton mutations and the dI/dS ratio, where dI is the number of intergenic SNPs per intergenic site. We find that the strength of purifying selection operating over all intergenic sites is consistently intermediate between that operating on synonymous and nonsynonymous sites. Ribosome binding sites and noncoding RNAs tend to be under stronger selective constraint than promoters and Rho-independent terminators. Strikingly, a clear signal of purifying selection remains even when all these major categories of regulatory elements are excluded, and this constraint is highest immediately upstream of genes. While a paucity of variation means that the data for M. tuberculosis are more equivocal than for the other species, we find strong evidence for positive selection within promoters of this species. This points to a key adaptive role for regulatory changes in this important pathogen. Our study underlines the feasibility and utility of gauging the selective forces operating on bacterial IGRs from whole-genome sequence data, and suggests that our current understanding of the functionality of these sequences is far from complete.

Keywords: bacterial genomics; intergenic regions (IGRs); purifying selection; whole-genome sequencing.

MeSH terms

  • Conserved Sequence / genetics
  • DNA, Intergenic / genetics*
  • Escherichia coli / genetics
  • Evolution, Molecular
  • Genome, Bacterial*
  • Klebsiella pneumoniae / genetics
  • Mycobacterium tuberculosis / genetics
  • RNA, Untranslated / genetics*
  • Regulatory Sequences, Nucleic Acid*
  • Ribosomes / genetics
  • Salmonella enterica / genetics
  • Staphylococcus aureus / genetics
  • Streptococcus pneumoniae / genetics


  • DNA, Intergenic
  • RNA, Untranslated