Genomic repertoires of DNA-binding transcription factors across the tree of life

Nucleic Acids Res. 2010 Nov;38(21):7364-77. doi: 10.1093/nar/gkq617. Epub 2010 Jul 30.


Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Catalogs as Topic
  • DNA-Binding Proteins / chemistry
  • Databases, Genetic
  • Eukaryota / genetics
  • Gene Duplication
  • Genome, Archaeal
  • Genome, Bacterial
  • Genomics
  • Protein Structure, Tertiary
  • Transcription Factors / chemistry
  • Transcription Factors / classification*
  • Transcription Factors / genetics


  • DNA-Binding Proteins
  • Transcription Factors