Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes

BMC Bioinformatics. 2008 May 9:9:233. doi: 10.1186/1471-2105-9-233.

Abstract

Background: Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes.

Results: We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions.

Conclusion: The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • AT Rich Sequence / physiology
  • Algorithms
  • Amino Acid Motifs / genetics
  • Cell-Free System
  • Computational Biology / methods*
  • DNA-Directed RNA Polymerases / analysis
  • DNA-Directed RNA Polymerases / genetics
  • Escherichia coli Proteins / analysis
  • Escherichia coli Proteins / genetics
  • Genome, Bacterial*
  • Pattern Recognition, Automated / methods*
  • Promoter Regions, Genetic*
  • Protein Subunits / analysis
  • Protein Subunits / genetics
  • Protein Subunits / metabolism
  • Sequence Alignment
  • Sequence Analysis, DNA / methods
  • Sigma Factor / analysis*
  • Sigma Factor / genetics*
  • Thermotoga maritima / genetics*

Substances

  • Escherichia coli Proteins
  • Protein Subunits
  • Sigma Factor
  • DNA-Directed RNA Polymerases