Studying the functional conservation of cis-regulatory modules and their transcriptional output

BMC Bioinformatics. 2008 Apr 29:9:220. doi: 10.1186/1471-2105-9-220.

Abstract

Background: Cis-regulatory modules (CRMs) are distinct, genomic regions surrounding the target gene that can independently activate the promoter to drive transcription. The activation of a CRM is controlled by the binding of a certain combination of transcription factors (TFs). It would be of great benefit if the transcriptional output mediated by a specific CRM could be predicted. Of equal benefit would be identifying in silico a specific CRM as the driver of the expression in a specific tissue or situation. We extend a recently developed biochemical modeling approach to manage both prediction tasks. Given a set of TFs, their protein concentrations, and the positions and binding strengths of each of the TFs in a putative CRM, the model predicts the transcriptional output of the gene. Our approach predicts the location of the regulating CRM by using predicted TF binding sites in regions near the gene as input to the model and searching for the region that yields a predicted transcription rate most closely matching the known rate.

Results: Here we show the ability of the model on the example of one of the CRMs regulating the eve gene, MSE2. A model trained on the MSE2 in D. melanogaster was applied to the surrounding sequence of the eve gene in seven other Drosophila species. The model successfully predicts the correct MSE2 location and output in six out of eight Drosophila species we examine.

Conclusion: The model is able to generalize from D. melanogaster to other Drosophila species and accurately predicts the location and transcriptional output of MSE2 in those species. However, we also show that the current model is not specific enough to function as a genome-wide CRM scanner, because it incorrectly predicts other genomic regions to be MSE2s.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosome Mapping / methods*
  • Conserved Sequence / genetics*
  • Molecular Sequence Data
  • Promoter Regions, Genetic / genetics*
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / genetics*

Substances

  • Transcription Factors