A comparative genomics approach to prediction of new members of regulons

Genome Res. 2001 Apr;11(4):566-84. doi: 10.1101/gr.149301.


Identifying the complete transcriptional regulatory network for an organism is a major challenge. For each regulatory protein, we want to know all the genes it regulates, that is, its regulon. Examples of known binding sites can be used to estimate the binding specificity of the protein and to predict other binding sites. However, binding site predictions can be unreliable because determining the true specificity of the protein is difficult because of the considerable variability of binding sites. Because regulatory systems tend to be conserved through evolution, we can use comparisons between species to increase the reliability of binding site predictions. In this article, an approach is presented to evaluate the computational predictions of regulatory sites. We combine the prediction of transcription units having orthologous genes with the prediction of transcription factor binding sites based on probabilistic models. We augment the sets of genes in Escherichia coli that are expected to be regulated by two transcription factors, the cAMP receptor protein and the fumarate and nitrate reduction regulatory protein, through a comparison with the Haemophilus influenzae genome. At the same time, we learned more about the regulatory networks of H. influenzae, a species with much less experimental knowledge than E. coli. By studying orthologous genes subject to regulation by the same transcription factor, we also gained understanding of the evolution of the entire regulatory systems.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / genetics
  • Binding Sites / genetics
  • Computational Biology* / methods
  • Computational Biology* / statistics & numerical data
  • Conserved Sequence
  • Cyclic AMP Receptor Protein / genetics
  • DNA-Binding Proteins / genetics
  • Escherichia coli / genetics
  • Escherichia coli Proteins*
  • Genome, Bacterial*
  • Genomics / methods*
  • Genomics / statistics & numerical data
  • Iron-Sulfur Proteins / genetics
  • Molecular Sequence Data
  • Regulon / genetics*
  • Sequence Alignment / methods
  • Sequence Alignment / statistics & numerical data
  • Transcription Factors / genetics


  • Bacterial Proteins
  • Cyclic AMP Receptor Protein
  • DNA-Binding Proteins
  • Escherichia coli Proteins
  • FNR protein, E coli
  • Iron-Sulfur Proteins
  • Transcription Factors