Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 2020

ChIPSummitDB: A ChIP-seq-based Database of Human Transcription Factor Binding Sites and the Topological Arrangements of the Proteins Bound to Them

Affiliations

ChIPSummitDB: A ChIP-seq-based Database of Human Transcription Factor Binding Sites and the Topological Arrangements of the Proteins Bound to Them

Erik Czipa et al. Database (Oxford).

Abstract

ChIP-seq reveals genomic regions where proteins, e.g. transcription factors (TFs) interact with DNA. A substantial fraction of these regions, however, do not contain the cognate binding site for the TF of interest. This phenomenon might be explained by protein-protein interactions and co-precipitation of interacting gene regulatory elements. We uniformly processed 3727 human ChIP-seq data sets and determined the cistrome of 292 TFs, as well as the distances between the TF binding motif centers and the ChIP-seq peak summits. ChIPSummitDB enables the analysis of ChIP-seq data using multiple approaches. The 292 cistromes and corresponding ChIP-seq peak sets can be browsed in GenomeView. Overlapping SNPs can be inspected in dbSNPView. Most importantly, the MotifView and PairShiftView pages show the average distance between motif centers and overlapping ChIP-seq peak summits and distance distributions thereof, respectively. In addition to providing a comprehensive human TF binding site collection, the ChIPSummitDB database and web interface allows for the examination of the topological arrangement of TF complexes genome-wide. ChIPSummitDB is freely accessible at http://summit.med.unideb.hu/summitdb/. The database will be regularly updated and extended with the newly available human and mouse ChIP-seq data sets.

Figures

Figure 1
Figure 1
Schematic overview of ChIP-seq data processing and imported content from MySQL. The analysis steps and data conversion are marked with thick arrows. The uploaded results/files are represented with dashed lines. A vast majority of processed data are available on ChIPSummitDB, including the predicted peak regions, optimized JASPAR CORE PWMs, identified TFBSs and calculated protein position information.
Figure 2
Figure 2
The distance distribution of FOXA1 summits relative to the motif centers of FOXA1 binding sites. The horizontal axis represents the distance of summits in different cell lines [T47D (SRA ID: SRX100454, red curve), HepG2 (SRA ID: SRX100506 blue curve) and VCaP (SRA ID: SRX497612, green curve)] relative to the FOXA1 motif center. The vertical axis represents the distance frequencies. A rolling mean with a 5 bp window was applied to smooth the frequency curves. The distance between the maxima (main summit, maxima at −3 bp) and the shoulder (7 bp) is ~10 bp. Element numbers in the table indicate the number of peak regions obtained in a ChIP-seq experiment, which overlap with a particular consensus motif binding site set. Figure is adapted from ChIPSummitDB website: http://summit.med.unideb.hu/summitdb/paired_shift_view.php?exp1=419&exp2=1960&exp3=3681&motive=FOXA1&motifid=77&limit=25&low_limit=-25&formminid=1&formmaxid=10000&mnelem=100&formmaxelem=120000.
Figure 3
Figure 3
The standard deviation of the distances of the peak summit and binding site centers shows the DNA–protein proximity. Each scatter represents average summit position from a single ChIP-seq experiment. The X-axis represents the distance from the binding site center, which position is marked by `0’ in the binding motif logo. The standard deviation of the summit-motif center distances is shown on the Y-axis. (A) The proteins, which show interaction with YY1 binding sites, are arranged in three groups. The lowest SD (between 16 and 22) belongs to YY1 protein, which binds directly to the YY1 DNA binding motifs. In the second layer, CTCF and cohesin subunit (RAD21, SA1) ChIP-seq signals are the most common. The third group with high SD, above 27, represents a diverse population, which consists of ChIP-seq experiments with different protein targets and more than 1000 overlapping peaks. The P300 and MAX proteins from Group 3 are labeled by red and green colors, respectively. The figure was slightly modified and adapted from ChIPSummitDB website: http://summit.med.unideb.hu/summitdb/motif_view.php?maxid=10000&minid=1&mnelem=1000&mxelem=120000&motive=YY1. (B) In the case of the CTCF binding sites, only two layers can be distinguished. In the first group, the directly interacting CTCF, RAD21 and SA1 proteins can be found, while the YY1, P300, MAX and other proteins are presented in the second group. Please note that the relative position of the YY1 and CTCF proteins to each other is the same on both plots. The figure was slightly modified and adapted from ChIPSummitDB website: http://summit.med.unideb.hu/summitdb/motif_view.php?maxid=10000&minid=1&mnelem=5000&mxelem=120000&motive=CTCF
Figure 4
Figure 4
Binding sites based analysis of topological arrangements of TF–DNA complexes as visualized in MotifView and PairShiftView. The plots show the preferred positions of different proteins on (A) GATA1::TAL1 binding sites and on (B) NFYB binding sites. The scatterplot follows the same logic as shown on Figure 3. The figures derive from ChIPSummitDB, although the scatters were filtered to show only the presented factors. GATA1::TAL1: http://summit.med.unideb.hu/summitdb/motif_view.php?maxid=2000&minid=1&mnelem=1000&mxelem=120000&motive=GATA1%3A%3ATAL1 NFYB: http://summit.med.unideb.hu/summitdb/motif_view.php?maxid=2000&minid=1&mnelem=2000&mxelem=120000&motive=NFYB. The histograms (at right) show the distribution of the summits relative to the midpoint (motif centers). The horizontal axis shows the distance from motif center, measured in base pairs. The vertical axis displays the distance frequency of summits at the given positions. Each ChIP-seq experiment is represented by a frequency curve (A) GATA1:TAL1: blue—SRX386203, red—SRX386202; (B) NFYB: red—SRX037419, blue—SRX150508, and green—SRX100471, which are smoothed with a rolling mean with a 5 bp window. Element numbers in the tables indicate the number of peak regions obtained in a ChIP-seq experiment, which overlap with a particular consensus motif binding site set. Figures are adapted from ChIPSummitDB website: GATA1::TAL1: http://summit.med.unideb.hu/summitdb/paired_shift_view.php?exp1=218&exp2=220&exp3=undefined&motive=GATA1::TAL1&motifid=89&limit=40&low_limit=-40&formminid=1&formmaxid=2000&mnelem=500&formmaxelem=120000 NFYB: http://summit.med.unideb.hu/summitdb/paired_shift_view.php?exp1=2301&exp2=761&exp3=1597&motive=NFYB&motifid=175&limit=40&low_limit=-40&mnelem=2000.

Similar articles

See all similar articles

Cited by 1 article

References

    1. Ren B., Wyrick J.J., Aparicio O. et al. (2000) Genome-wide location and function of DNA binding proteins. Science, 290, 2306–2309. - PubMed
    1. Iyer V.R., Horak C.E., Scafe C.S. et al. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature, 409, 533–538. - PubMed
    1. Gordan R., Hartemink A.J. and Bulyk M.L. (2009) Distinguishing direct versus indirect transcription factor-DNA interactions. Genome Res., 19, 2090–2100. - PMC - PubMed
    1. Xiao T., Wallace J. and Felsenfeld G. (2011) Specific sites in the C terminus of CTCF interact with the SA2 subunit of the cohesin complex and are required for cohesin-dependent insulation activity. Mol. Cell. Biol., 31, 2174–2183. - PMC - PubMed
    1. Bartke T., Vermeulen M., Xhemalce B. et al. (2010) Nucleosome-interacting proteins regulated by DNA and histone methylation. Cell, 143, 470–484. - PMC - PubMed

Publication types

Feedback