Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies

J Mol Biol. 1998 Sep 4;281(5):827-42. doi: 10.1006/jmbi.1998.1947.


We present here a simple and fast method allowing the isolation of DNA binding sites for transcription factors from families of coregulated genes, with results illustrated in Saccharomyces cerevisiae. Although conceptually simple, the algorithm proved efficient for extracting, from most of the yeast regulatory families analyzed, the upstream regulatory sequences which had been previously found by experimental analysis. Furthermore, putative new regulatory sites are predicted within upstream regions of several regulons. The method is based on the detection of over-represented oligonucleotides. A specificity of this approach is to define the statistical significance of a site based on tables of oligonucleotide frequencies observed in all non-coding sequences from the yeast genome. In contrast with heuristic methods, this oligonucleotide analysis is rigorous and exhaustive. Its range of detection is however limited to relatively simple patterns: short motifs with a highly conserved core. These features seem to be shared by a good number of regulatory sites in yeast. This, and similar methods, should be increasingly required to identify unknown regulatory elements within the numerous new coregulated families resulting from measurements of gene expression levels at the genomic scale. All tools described here are available on the web at the site http://copan.cifn.unam.mx/Computational_Biology/ yeast-tools

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites / genetics
  • Computer Communication Networks
  • Computers
  • DNA-Binding Proteins / genetics
  • Gene Expression Regulation, Fungal / genetics*
  • Genes, Fungal / genetics*
  • Genome, Fungal
  • Oligodeoxyribonucleotides / analysis*
  • Regulatory Sequences, Nucleic Acid / genetics
  • Saccharomyces cerevisiae / genetics*
  • Transcription Factors / metabolism


  • DNA-Binding Proteins
  • Oligodeoxyribonucleotides
  • Transcription Factors