Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach

J Mol Biol. 2000 Mar 31;297(3):599-606. doi: 10.1006/jmbi.2000.3589.


We present a new algorithm called PromoterInspector to locate eukaryotic polymase II promoter regions in large genomic sequences with a high degree of specificity. PromoterInspector focuses on the genetic context of promoters, rather than their exact location. Application of PromoterInspector can serve as a crucial pre-processing step for other methods to locate exactly, or to analyze promoters. PromoterInspector does not depend on heuristics, because it is purely based on libraries of IUPAC words extracted from training sequences by an unsupervised learning approach. We compared PromoterInspector to in silico promoter prediction tools using the sequences from the review by J.W. Fickett. PromoterInspector compared favourably on Fickett's evaluation scheme. A true positive to false positive ratio of 2.3 was obtained, surpassing the best ratio of 0.6, reported for TSSG. The application of our method to several large genomic sequences of over 1.3 million base-pairs in total resulted in even more specific predictions. The coverage of annotated promoters was comparable to other in silico promoter prediction methods, while the true positive predictions increased by up to 100% of total matches. PromoterInspector scans 100 kb in less than one minute on a workstation, and thus is especially applicable for large genome analysis. The method is available at http://genomatix.gsf. de/cgi-bin/promoterinspector/promoterinspector.pl.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions / genetics
  • Algorithms*
  • Animals
  • Chromosomes / genetics
  • Computational Biology / methods*
  • Exons / genetics
  • False Positive Reactions
  • Genome*
  • Humans
  • Internet
  • Introns / genetics
  • Mice
  • Promoter Regions, Genetic / genetics*
  • RNA Polymerase II / physiology*
  • Reproducibility of Results
  • Sensitivity and Specificity


  • 3' Untranslated Regions
  • RNA Polymerase II