Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 128 (6), 1231-45

Analysis of the Vertebrate Insulator Protein CTCF-binding Sites in the Human Genome


Analysis of the Vertebrate Insulator Protein CTCF-binding Sites in the Human Genome

Tae Hoon Kim et al. Cell.


Insulator elements affect gene expression by preventing the spread of heterochromatin and restricting transcriptional enhancers from activation of unrelated promoters. In vertebrates, insulator's function requires association with the CCCTC-binding factor (CTCF), a protein that recognizes long and diverse nucleotide sequences. While insulators are critical in gene regulation, only a few have been reported. Here, we describe 13,804 CTCF-binding sites in potential insulators of the human genome, discovered experimentally in primary human fibroblasts. Most of these sequences are located far from the transcriptional start sites, with their distribution strongly correlated with genes. The majority of them fit to a consensus motif highly conserved and suitable for predicting possible insulators driven by CTCF in other vertebrate genomes. In addition, CTCF localization is largely invariant across different cell types. Our results provide a resource for investigating insulator function and possible other general and evolutionarily conserved activities of CTCF sites.


Figure 1
Figure 1. Chromosomal distribution of CTCF binding sites
(a) ChIP-chip analysis results for IGF2/H19 locus are shown. (b) A view of the CTCF binding at the H19/IGF2 imprint control region. (c) Correlation analysis of number of CTCF, ER and p53 binding sites with gene number on each chromosome. (d) Correlation analysis of number of CTCF, ER and p53 binding sites with the length of each chromosome.
Figure 2
Figure 2. Distribution of CTCF binding sites relative to genes
(a) A chromosomal view of the gene and CTCF binding site density of chromosome 11 is shown. Arrows indicate regions within the chromosome where overall correlation of CTCF binding sites and gene number deviate from the average. (b) A histogram summarizing the distribution of CTCF relative to the 5′ end of Known Genes. (c) A pie chart of CTCF binding sites mapping to exons, introns, promoters (within 2.5 Kb of the start sites), and intergenic regions of the genome. (d) Depletion of CTCF binding sites at clusters of related genes. A cluster of olfactory receptor (OR) genes is bounded by a pair of CTCF binding sites, indicated by a long red vertical lines. (e) An example of CTCF binding sites punctuating the alternate promoters in the protocadherin gamma locus. Red vertical lines indicate CTCF binding sites. The blue bars within the top panel show the relative expression of probes that map to the locus. The width of each bar represents the length of each gene.
Figure 3
Figure 3. CTCF binding sites are characterized by a 20-mer motif
(a) DNA logo (Workman et al., 2005) representing the CTCF binding motif defined from ChIP-on-chip experiment and the previously reported consensus CTCF binding sites (Bell and Felsenfeld, 2000) are shown. Height of each letter represents the relative frequency of occurrence of the nucleotide at each position. (b) Distribution of high scoring motifs within the experimentally defined CTCF binding sites. Yellow horizontal lines represent each CTCF binding site, and short blue lines represent the position of a high scoring 20-mer motif found within the CTCF binding sites. (c) EMSA results for 12 CTCF (WT) and the corresponding shuffled (SH) probes (Supplemental Table 7) showing that 11 of 12 motifs found within the CTCF binding sites are specifically recognized by recombinant CTCF protein.
Figure 4
Figure 4. CTCF recognition sites are highly conserved in other vertebrates
(a) Distribution of CTCF binding motifs found in other vertebrate genomes is compared to the frequency of a randomly shuffled CTCF motif in each genome. (b) Venn diagram of computationally predicted CTCF binding sites in the human genome that are conserved in other vertebrates. The alignments on the right are examples of how each motif with different levels of conservation aligns to the corresponding sequences in other species. (c) EMSA results for 2 CTCF (WT) binding sites predicted in the chicken genomes and the corresponding shuffled (SH) probes (Supplemental Table 9).
Figure 5
Figure 5. Comparison of CTCF binding in two cell types
(a) Representative view of CTCF binding in IMR90 and U937 cells within the ENCODE regions. The first panel lists all known genes within the region. The second and third panels show the CTCF binding data within the region for the IMR90 and U937 cells, respectively. The fourth panel shows the predicted CTCF binding sites based on 20-mer motif. (b) A Venn diagram showing the overlap of CTCF binding in IMR90 and U937 cells at the confidence level, P < 0.000001. (c) Validation of three cell type specific sites by quantitative real-time PCR (Supplemental Table 10).
Figure 6
Figure 6. CTCF binding site show a unique nucleotide change during evolution
Nucleotide changes observed within the mapped CTCF motifs in all available vertebrate genomes. Distribution of base changes observed in the CTCF binding sites are plotted along the 20-mer motif.

Similar articles

See all similar articles

Cited by 523 articles

See all "Cited by" articles

Publication types

Associated data

LinkOut - more resources