BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes

Pac Symp Biocomput. 2001;127-38.


The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bacillus subtilis / genetics
  • Bacillus subtilis / metabolism
  • Base Sequence
  • Binding Sites
  • Carrier Proteins
  • Conserved Sequence
  • Cyclic AMP Receptor Protein / metabolism
  • DNA / genetics*
  • DNA / metabolism
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Gene Expression Profiling / statistics & numerical data
  • Genes, Regulator*
  • Markov Chains
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Saccharomyces cerevisiae / genetics
  • Saccharomyces cerevisiae / metabolism
  • Sequence Alignment / statistics & numerical data
  • Software*
  • TATA Box
  • rap1 GTP-Binding Proteins / metabolism


  • Carrier Proteins
  • Cyclic AMP Receptor Protein
  • DNA
  • rap1 GTP-Binding Proteins