Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 42 (Database issue), D156-60

CollecTF: A Database of Experimentally Validated Transcription Factor-Binding Sites in Bacteria


CollecTF: A Database of Experimentally Validated Transcription Factor-Binding Sites in Bacteria

Sefa Kiliç et al. Nucleic Acids Res.


The influx of high-throughput data and the need for complex models to describe the interaction of prokaryotic transcription factors (TF) with their target sites pose new challenges for TF-binding site databases. CollecTF ( compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base. Data quality and currency are fostered in CollecTF by adopting a sustainable model that encourages direct author submissions in combination with in-house validation and curation of published literature. CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as link-out features, maximizing the visibility of the data and enriching the annotation of RefSeq files with regulatory information. Seeking to facilitate comparative genomics and machine-learning analyses of regulatory interactions, in its initial release CollecTF provides domain-wide coverage of two TF families (LexA and Fur), as well as extensive representation for a clinically important bacterial family, the Vibrionaceae.


Figure 1.
Figure 1.
Schematic representation illustrating the CollecTF data structure, curation and navigation processes. (Left panel) The curation table is the pivotal element of the relational design in CollecTF, providing a central link to all the other tables in the database (Supplementary Figure S1) and establishing a link between reported TF-binding sites, the evidence supporting them, their regulatory effects on genes and their mapped instances in a reference genome. (Right panel) Navigation is initiated by browsing or customized search, leading to a dynamically generated report that can be cumulative or individualized for each TF/species pair (Supplementary Figure S2). Motif alignments and logos are provided for visualization, together with export functions to FASTA and flat-file formats. Users can link out to specific site reports and link-out to curation reports to evaluate all the supporting evidence for reported sites and the genome mapping process.
Figure 2.
Figure 2.
(A) Sequence logo for LexA-binding sites in the Firmicutes (top) and in Bacillus subtilis (bottom). (B) Sequence logo for Fur-binding sites with experimental evidence of binding (top) or experimental evidence of regulation (bottom). Both examples are extracted from dynamically generated CollecTF reports and illustrate the ability to customize queries and the fluidity inherent to the concept of TF-binding motif in CollecTF.
Figure 3.
Figure 3.
Snapshot of the site report page for a Pseudomonas aeruginosa LexA-binding site, illustrating the integration of supporting experimental evidence and including out-links to curations, publications, technique descriptions and NCBI Gene records. Like all other site report pages, this record is directly accessible through its db_xref link at
Figure 4.
Figure 4.
Detail of the CollecTF generated record for the Bdellovibrio bacteriovorus complete genome (RefSeq accession NC_005363.1) showing the /bound_moiety feature corresponding to a LexA-binding site upstream of the lexA gene (Bd3511). The feature details the experimental evidence for the site with associated PubMed identifiers and provides a db_xref out-link to CollecTF.

Similar articles

See all similar articles

Cited by 31 PubMed Central articles

See all "Cited by" articles


    1. Wunderlich Z, Mirny LA. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 2009;25:434–440. - PMC - PubMed
    1. Reményi A, Schöler HR, Wilmanns M. Combinatorial control of gene expression. Nat. Struct. Mol. Biol. 2004;11:812–815. - PubMed
    1. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L, García-Sotelo JS, Weiss V, Solano-Lira H, Martínez-Flores I, Medina-Rivera A, et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res. 2013;41:D203–D213. - PMC - PubMed
    1. Jacques P-É, Gervais AL, Cantin M, Lucier J-F, Dallaire G, Drouin G, Gaudreau L, Goulet J, Brzezinski R. MtbRegList, a database dedicated to the analysis of transcriptional regulation in Mycobacterium tuberculosis. Bioinformatics. 2005;21:2563–2565. - PubMed
    1. Pauling J, Röttger R, Tauch A, Azevedo V, Baumbach J. CoryneRegNet 6.0—Updated database content, new analysis methods and novel features focusing on community demands. Nucleic Acids Res. 2012;40:D610–D614. - PMC - PubMed

Publication types