A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome

J Mol Biol. 1998 Nov 27;284(2):241-54. doi: 10.1006/jmbi.1998.2160.


A major mode of gene regulation occurs via the binding of specific proteins to specific DNA sequences. The availability of complete bacterial genome sequences offers an unprecedented opportunity to describe networks of such interactions by correlating existing experimental data with computational predictions. Of the 240 candidate Escherichia coli DNA-binding proteins, about 55 have DNA-binding sites identified by DNA footprinting. We used these sites to construct recognition matrices, which we used to search for additional binding sites in the E. coli genomic sequence. Many of these matrices show a strong preference for non-coding DNA. Discrepancies are identified between matrices derived from natural sites and those derived from SELEX (Systematic Evolution of Ligands by Exponential enrichment) experiments. We have constructed a database of these proteins and binding sites, called DPInteract (available at http://arep.med.harvard.edu/dpinteract).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Proteins / metabolism*
  • Binding Sites
  • Computational Biology / methods*
  • DNA Footprinting
  • DNA, Bacterial / metabolism*
  • DNA-Binding Proteins / metabolism*
  • Databases, Factual
  • Escherichia coli / genetics*
  • Genome, Bacterial
  • Molecular Sequence Data
  • Pattern Recognition, Automated
  • Protein Binding
  • Software


  • Bacterial Proteins
  • DNA, Bacterial
  • DNA-Binding Proteins

Associated data

  • GENBANK/U00096