Whole-genome discovery of transcription factor binding sites by network-level conservation

Genome Res. 2004 Jan;14(1):99-108. doi: 10.1101/gr.1739204. Epub 2003 Dec 12.

Abstract

Comprehensive identification of DNA cis-regulatory elements is crucial for a predictive understanding of transcriptional network dynamics. Strong evidence suggests that these DNA sequence motifs are highly conserved between related species, reflecting strong selection on the network of regulatory interactions that underlie common cellular behavior. Here, we exploit a systems-level aspect of this conservation-the network-level topology of these interactions-to map transcription factor (TF) binding sites on a genomic scale. Using network-level conservation as a constraint, our algorithm finds 71% of known TF binding sites in the yeast Saccharomyces cerevisiae, using only 12% of the sequence of a phylogenetic neighbor. Most of the novel predicted motifs show strong features of known TF binding sites, such as functional category and/or expression profile coherence of their corresponding genes. Network-level conservation should provide a powerful constraint for the systematic mapping of TF binding sites in the larger genomes of higher eukaryotes.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Algorithms
  • Base Composition / genetics
  • Binding Sites / genetics
  • Binding Sites / physiology
  • Conserved Sequence / genetics*
  • Contig Mapping / methods
  • Contig Mapping / statistics & numerical data
  • DNA, Fungal / genetics
  • Genome, Fungal*
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Predictive Value of Tests
  • Saccharomyces cerevisiae / genetics*
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism*

Substances

  • DNA, Fungal
  • Transcription Factors