Predicting bacterial transcription units using sequence and expression data

Bioinformatics. 2003;19 Suppl 1:i34-43. doi: 10.1093/bioinformatics/btg1003.


Motivation: A key aspect of elucidating gene regulation in bacterial genomes is identifying the basic units of transcription. We present a method, based on probabilistic language models, that we apply to predict operons, promoters and terminators in the genome of Escherichia coli K-12. Our approach has two key properties: (i) it provides a coherent set of predictions for related regulatory elements of various types and (ii) it takes advantage of both DNA sequence and gene expression data, including expression measurements from inter-genic probes.

Results: Our experimental results show that we are able to predict operons and localize promoters and terminators with high accuracy. Moreover, our models that use both sequence and expression data are more accurate than those that use only one of these two data sources.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Codon
  • Escherichia coli / genetics*
  • Escherichia coli Proteins / genetics
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Bacterial / genetics
  • Models, Genetic
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Operator Regions, Genetic / genetics
  • Promoter Regions, Genetic / genetics
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • Software
  • Terminator Regions, Genetic / genetics
  • Transcription Factors / genetics*


  • Codon
  • Escherichia coli Proteins
  • Transcription Factors