Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution

BMC Bioinformatics. 2008 Jan 23;9:37. doi: 10.1186/1471-2105-9-37.

Abstract

Background: The detection of conserved motifs in promoters of orthologous genes (phylogenetic footprints) has become a common strategy to predict cis-acting regulatory elements. Several software tools are routinely used to raise hypotheses about regulation. However, these tools are generally used as black boxes, with default parameters. A systematic evaluation of optimal parameters for a footprint discovery strategy can bring a sizeable improvement to the predictions.

Results: We evaluate the performances of a footprint discovery approach based on the detection of over-represented spaced motifs. This method is particularly suitable for (but not restricted to) Bacteria, since such motifs are typically bound by factors containing a Helix-Turn-Helix domain. We evaluated footprint discovery in 368 Escherichia coli K12 genes with annotated sites, under 40 different combinations of parameters (taxonomical level, background model, organism-specific filtering, operon inference). Motifs are assessed both at the levels of correctness and significance. We further report a detailed analysis of 181 bacterial orthologs of the LexA repressor. Distinct motifs are detected at various taxonomical levels, including the 7 previously characterized taxon-specific motifs. In addition, we highlight a significantly stronger conservation of half-motifs in Actinobacteria, relative to Firmicutes, suggesting an intermediate state in specificity switching between the two Gram-positive phyla, and thereby revealing the on-going evolution of LexA auto-regulation.

Conclusion: The footprint discovery method proposed here shows excellent results with E. coli and can readily be extended to predict cis-acting regulatory signals and propose testable hypotheses in bacterial genomes for which nothing is known about regulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Actinobacteria / genetics
  • Algorithms
  • Amino Acid Motifs / genetics
  • Bacterial Proteins / genetics
  • Conserved Sequence
  • DNA Footprinting / methods*
  • Escherichia coli K12 / genetics
  • Evolution, Molecular*
  • Genome, Bacterial
  • Gram-Positive Endospore-Forming Bacteria
  • Phylogeny
  • Promoter Regions, Genetic / genetics*
  • Sequence Homology, Nucleic Acid*
  • Serine Endopeptidases / genetics
  • Software*

Substances

  • Bacterial Proteins
  • LexA protein, Bacteria
  • Serine Endopeptidases