Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 1;197(15):2454-7.
doi: 10.1128/JB.00031-15. Epub 2015 May 26.

Every Site Counts: Submitting Transcription Factor-Binding Site Information Through the CollecTF Portal

Affiliations
Free PMC article

Every Site Counts: Submitting Transcription Factor-Binding Site Information Through the CollecTF Portal

Ivan Erill. J Bacteriol. .
Free PMC article

Abstract

Experimentally verified transcription factor-binding sites represent an information-rich and highly applicable data type that aptly summarizes the results of time-consuming experiments and inference processes. Currently, there is no centralized repository for this type of data, which is routinely embedded in articles and extremely hard to mine. CollecTF provides the first standardized resource for submission and deposition of these data into the NCBI RefSeq database, maximizing its accessibility and prompting the community to adopt direct submission policies.

Figures

FIG 1
FIG 1
Example of the customizable query system implemented in CollecTF. (A to D) Dynamically generated sequence logos (28) for queries on Fur-binding sites detected through EMSA or DNase footprinting in Gammaproteobacteria (A), reported by chromatin immunoprecipitation with microarray technology (ChIP-chip) or ChIP-seq in Gammaproteobacteria (B), reported by ChIP-chip or ChIP-seq in Vibrio cholerae (C) and as reported for E. coli (D) in a reference manuscript (29). To generate logos, sites matching the query are dynamically aligned using LASAGNA to define site orientation and the window of conservation above background (30). All the site data and metadata used to generate the logos are available for download in a variety of export formats (e.g., FASTA, CSV, or PSSM).
FIG 2
FIG 2
Details of the CollecTF site annotation step, where submitters select the specific techniques used to identify sites, as well as the mode of action and conformation, if known, of the transcription factor. Annotation is performed on sites entered as sequences or coordinates and previously mapped to the RefSeq genome record (inlet). Extensive documentation for the curation process is available on the CollecTF website (http://www.collectf.org/static/CollecTF_submission_guide.pdf) (24).
FIG 3
FIG 3
Details of a CollecTF “protein_bind” feature extracted from the Pseudomonas aeruginosa PAO1 genome sequence (NC_002516.2). The transcription factor (LasR) is identified as the “/bound_moiety,” and its protein accession number is provided in the “/note” field, together with regulated genes. The experimental support for this LasR-binding site comes several lines of evidence reported using the “/experiment” tag. The PubMed identifiers (PMID) for the scientific papers providing such evidence are listed next to the evidence description. A “/db_xref” field provides a link to the CollecTF record to explore the data integration and curation process for the reported site.

Similar articles

See all similar articles

Cited by 3 articles

Publication types

LinkOut - more resources

Feedback