Large-scale computational and statistical analyses of high transcription potentialities in 32 prokaryotic genomes

Nucleic Acids Res. 2008 Jun;36(10):3332-40. doi: 10.1093/nar/gkn135. Epub 2008 Apr 25.

Abstract

This article compares 32 bacterial genomes with respect to their high transcription potentialities. The sigma70 promoter has been widely studied for Escherichia coli model and a consensus is known. Since transcriptional regulations are known to compensate for promoter weakness (i.e. when the promoter similarity with regard to the consensus is rather low), predicting functional promoters is a hard task. Instead, the research work presented here comes within the scope of investigating potentially high ORF expression, in relation with three criteria: (i) high similarity to the sigma70 consensus (namely, the consensus variant appropriate for each genome), (ii) transcription strength reinforcement through a supplementary binding site--the upstream promoter (UP) element--and (iii) enhancement through an optimal Shine-Dalgarno (SD) sequence. We show that in the AT-rich Firmicutes' genomes, frequencies of potentially strong sigma70-like promoters are exceptionally high. Besides, though they contain a low number of strong promoters (SPs), some genomes may show a high proportion of promoters harbouring an UP element. Putative SPs of lesser quality are more frequently associated with an UP element than putative strong promoters of better quality. A meaningful difference is statistically ascertained when comparing bacterial genomes with similarly AT-rich genomes generated at random; the difference is the highest for Firmicutes. Comparing some Firmicutes genomes with similarly AT-rich Proteobacteria genomes, we confirm the Firmicutes specificity. We show that this specificity is neither explained by AT-bias nor genome size bias; neither does it originate in the abundance of optimal SD sequences, a typical and significant feature of Firmicutes more thoroughly analysed in our study.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • AT Rich Sequence
  • Base Sequence
  • Computational Biology
  • Consensus Sequence
  • DNA-Directed RNA Polymerases / metabolism*
  • Data Interpretation, Statistical
  • Enhancer Elements, Genetic
  • Escherichia coli / genetics
  • Genome, Bacterial*
  • Genomics
  • Open Reading Frames
  • Promoter Regions, Genetic*
  • Sigma Factor / metabolism*
  • Thermotoga maritima / genetics
  • Transcription, Genetic*

Substances

  • Sigma Factor
  • RNA polymerase sigma 70
  • DNA-Directed RNA Polymerases