Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Mar 1;18(2):279-290.
doi: 10.1093/bib/bbw023.

Recent Advances in ChIP-seq Analysis: From Quality Management to Whole-Genome Annotation

Affiliations
Free PMC article
Review

Recent Advances in ChIP-seq Analysis: From Quality Management to Whole-Genome Annotation

Ryuichiro Nakato et al. Brief Bioinform. .
Free PMC article

Abstract

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) analysis can detect protein/DNA-binding and histone-modification sites across an entire genome. Recent advances in sequencing technologies and analyses enable us to compare hundreds of samples simultaneously; such large-scale analysis has potential to reveal the high-dimensional interrelationship level for regulatory elements and annotate novel functional genomic regions de novo. Because many experimental considerations are relevant to the choice of a method in a ChIP-seq analysis, the overall design and quality management of the experiment are of critical importance. This review offers guiding principles of computation and sample preparation for ChIP-seq analyses, highlighting the validity and limitations of the state-of-the-art procedures at each step. We also discuss the latest challenges of single-cell analysis that will encourage a new era in this field.

Keywords: chromatin immunoprecipitation; differential analysis; experimental design; large-scale analysis; quality management; single-cell analysis.

Figures

Figure 1.
Figure 1.
ChIP-seq analysis workflow. Boxes indicate the steps involved in ChIP-seq analyses for various aims discussed in this review. The considerations for each step are itemized. (A) Sample preparation, sequencing and mapping. This procedure is common to both (B) and (C). (B) Small-scale analysis (single or a few samples). In this case, adjusting peak-calling strategy and parameters to each sample’s property is possible. (C) Large-scale analysis (many samples). Left rectangles indicate the different experiments (e.g. same analysis for different cell types). Because integrative analysis is sensitive to the quality of input samples and one-by-one adjusting is difficult, objective quality metrics for multilateral quantitative assessment is necessary to filter poor-quality data automatically.
Figure 2.
Figure 2.
Statistics and visualization of ChIP-seq analysis for human K562 cells. A representative data set of ENCODE consortium [45]. The sequenced read files (fastq) and the reference peak lists (detected by Scripture [57] under the assumption of uniform background signal) were downloaded from GEO under accession number GSE29611. The fastq files were mapped onto the human genome (UCSC hg19) using Bowtie version 1.1.0 [42], allowing uniquely mapped reads only. (A) Summary statistics for each sample. The averaged read quality was obtained using fastqc version 0.11.4 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The number of non-redundant reads, library complexity for 10 million mapped reads and FRiP scores were calculated using DROMPA3 version 3.0.0 [58]. Normalized strand coefficient (NSC) and relative strand correlation (RSC) scores were obtained using phantompeakqualtools version 1.1 (https://code.google.com/p/phantompeakqualtools). (B) The non-redundant read distribution for each sample with a RefSeq gene annotation (chromosome 1, 244.5–245.1 Mb). For the gene line (yellow box), genes in the upper and lower halves are on forward and reverse strands, respectively. The green and blue histograms represent the read distribution of ChIP and the input samples for 100-bp bins, respectively. The reference peak regions are highlighted in red. Note that the y axis indicates the read number normalized for the number of non-redundant reads, whereas the reference peak lists were identified based on raw read numbers. The gene reference was obtained from the UCSC genome browser [59]. (C) Visualization of the ChIP/Control enrichment distribution for 100-kb bins (chromosome 10). Bins with ChIP/control >1 are highlighted in red, and those with ChIP/control ≤1 are in gray. The GC contents and gene numbers for 500 kb windows are also plotted. The figures (B) and (C) were generated by DROMPA3.
Figure 3.
Figure 3.
ChIP/input enrichment distribution of S. cerevisiae (chromosome I, 136–162 kb). Data from [11]. Smc6, Nse4 and ‘No tag (negative control)’ ChIP-seq data for a 100-bp bin with gene annotation obtained from the Saccharomyces Genome Database (http://www.yeastgenome.org). The reads were mapped onto the genome, allowing multiple mapped reads. For the yeast genome, inspecting the genome-wide ChIP/input enrichment distribution is effective because a read depth is large enough (>10-fold) and the division with the input sample can minimize the technical and biological biases of the conditions. The enriched regions of Smc6 and Nse4 that overlap those of the ‘No tag’ sample (black arrows) suggest false positives (e.g. hyper-ChIPable regions).
Figure 4.
Figure 4.
Spike-in analysis of H3K79me2 ChIP-seq data for 0%, 25%, 50%, 75% and 100% EPZ5676-treated Jurkat cells. Data from [96] (GEO under accession number GSE60104). Spike-in normalization was implemented using the number of reads uniquely mapped onto the fly genome (UCSC dm3). (A) Read distribution near the RPL13A gene locus for 100-bp bins. Left: total read normalization, right: spike-in normalization. (B) Aggregation plots of total read normalization (left) and spike-in normalization (right) from 5-kb upstream to 10-kb downstream of the TSSs of the RefSeq genes. Shaded regions indicate a 95% confidence interval. (A) and (B) are identical visualizations of Figure 3C and E in reference [96], respectively. (C) Log-scale relative enrichment of H3K79me2 for 25%, 50%, 75% and 100% treated cells against 0% treated cells near the RPL13A gene locus (chromosome 19,49.88–50.13 Mb), with a 100-kb bin and 20-kb smoothing window. The top green line displays a H3K79me2 read distribution for 0% treated cells to roughly identify H3K79me2-enriched (green bars) and background regions (blue bars). Regions in which the enrichment (y-axis) is > 1 and <1 indicate a relative increase and decrease, respectively.

Similar articles

See all similar articles

Cited by 32 articles

See all "Cited by" articles

References

    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669–80. - PMC - PubMed
    1. Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-seq studies. Nat Methods 2009;6:S22–32. - PMC - PubMed
    1. Furey TS. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 2012;13:840–52. - PMC - PubMed
    1. Deardorff MA, Bando M, Nakato R., et al. HDAC8 mutations in Cornelia de Lange syndrome affect the cohesin acetylation cycle. Nature 2012;489:313–17. - PMC - PubMed
    1. Schaub MA, Boyle AP, Kundaje A., et al. Linking disease associations with regulatory information in the human genome. Genome Res 2012;22:1748–59. - PMC - PubMed
Feedback