Genome-wide discovery of transcriptional modules from DNA sequence and gene expression

Bioinformatics. 2003;19 Suppl 1:i273-82. doi: 10.1093/bioinformatics/btg1038.


In this paper, we describe an approach for understanding transcriptional regulation from both gene expression and promoter sequence data. We aim to identify transcriptional modules--sets of genes that are co-regulated in a set of experiments, through a common motif profile. Using the EM algorithm, our approach refines both the module assignment and the motif profile so as to best explain the expression data as a function of transcriptional motifs. It also dynamically adds and deletes motifs, as required to provide a genome-wide explanation of the expression data. We evaluate the method on two Saccharomyces cerevisiae gene expression data sets, showing that our approach is better than a standard one at recovering known motifs and at generating biologically coherent modules. We also combine our results with binding localization data to obtain regulatory relationships with known transcription factors, and show that many of the inferred relationships have support in the literature.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Amino Acid Motifs / genetics
  • Artificial Intelligence
  • Chromosome Mapping / methods*
  • Cluster Analysis
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / physiology*
  • Genes, Regulator / genetics*
  • Genome
  • Models, Genetic
  • Models, Statistical
  • Pattern Recognition, Automated
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism
  • Saccharomyces cerevisiae Proteins / metabolism
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • User-Computer Interface


  • Saccharomyces cerevisiae Proteins