Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;24(7):1157-68.
doi: 10.1101/gr.168260.113. Epub 2014 Apr 7.

Quantifying ChIP-seq Data: A Spiking Method Providing an Internal Reference for Sample-To-Sample Normalization

Collaborators, Affiliations
Free PMC article

Quantifying ChIP-seq Data: A Spiking Method Providing an Internal Reference for Sample-To-Sample Normalization

Nicolas Bonhoure et al. Genome Res. .
Free PMC article

Abstract

Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) experiments are widely used to determine, within entire genomes, the occupancy sites of any protein of interest, including, for example, transcription factors, RNA polymerases, or histones with or without various modifications. In addition to allowing the determination of occupancy sites within one cell type and under one condition, this method allows, in principle, the establishment and comparison of occupancy maps in various cell types, tissues, and conditions. Such comparisons require, however, that samples be normalized. Widely used normalization methods that include a quantile normalization step perform well when factor occupancy varies at a subset of sites, but may miss uniform genome-wide increases or decreases in site occupancy. We describe a spike adjustment procedure (SAP) that, unlike commonly used normalization methods intervening at the analysis stage, entails an experimental step prior to immunoprecipitation. A constant, low amount from a single batch of chromatin of a foreign genome is added to the experimental chromatin. This "spike" chromatin then serves as an internal control to which the experimental signals can be adjusted. We show that the method improves similarity between replicates and reveals biological differences including global and largely uniform changes.

Figures

Figure 1.
Figure 1.
Normalization can obscure global effects. (A) Schematic representation of peaks obtained after ChIP-seq in a hypothetical example where all peaks are uniformly diminished in the second (purple) sample compared with the first (light blue). These samples can represent a replicate experiment, in which case the overall decrease observed in the second sample is the result of experimental variation, or they can represent experiments performed with samples collected under different conditions, in which case the global decrease might reflect a biological difference. No spike chromatin is included. (B) Normalization by scaling to total number of tags aligned onto the genome (i.e., normalization for sequencing depth) showing tag counts (top) and log2 fold change (bottom). In this hypothetical example, the number of tags aligned onto the genome is quite similar in both samples, and this type of normalization indicates a general decrease for each peak in the second sample, whether the two samples are biologically different (and thus should indeed indicate a protein occupancy decrease in sample 2) or similar (and thus should in fact display similar signals). (C) Normalization by scaling followed by quantile normalization showing tag counts (top) and log2 fold change (bottom). In this example, the second step—quantile normalization—will equalize the sample distributions whether the samples are biologically different or not, because the decrease in sample 2 is uniform. In D and E, spike chromatin is included in the sample and gives rise to signals symbolized by the yellow bars. (F,G) Normalization by scaling followed by spike adjustment showing tag counts (top) and log2 fold change (bottom). In F, the spike adjustment factor increased the signals in sample 2 by a factor of about two, in G, the spike adjustment factor decreased the signal in sample 2 by a factor of about 0.8 (see yellow bars). Spike adjustment reveals whether the samples are in fact similar (example in F) or are in fact biologically different (example in G).
Figure 2.
Figure 2.
Schematic diagram summarizing the SAP. The main steps, i.e., examination of sample quality, scaling to total amount of genome-aligned tags, selection of signal genes, score calculation, and spike adjustment, are numbered.
Figure 3.
Figure 3.
The spike chromatin can be used for quality control. Mean-difference scatter plot of human Pol III genome bin counts (in log scale). Red dots indicate genomic bins that overlap with Pol III loci. The genome was binned into 400-bp bins (corresponding to a typical Pol III gene length [∼100 bp] extended by 150 bp in both the upstream and downstream directions). Zero-count bins were filtered out prior to plotting. (A) An example of a good-quality sample (90_R1). (B) An example of a poor-quality sample (97.5_P1).
Figure 4.
Figure 4.
The SAP tolerates sample-to-sample differences of average chromatin fragment length. (A) Illustration of two hypothetical cases. (Top) The mouse chromatin sample (blue) is sonicated to an average size >500 bp; (bottom) the average size is <500 bp. The human chromatin (red) used to spike the samples is from the same batch and has an average size of 500 bp. Size selection from 200 to 400 bp is expected to result in a smaller proportion of mouse chromatin in the first case than in the second case. (B) Size representation obtained by fragment analyzer (top) and 1% agarose gel electrophoresis (bottom) of three mouse chromatin samples sonicated for 5 (S5), 10 (S10), and 15 (S15) cycles of 10 sec, as indicated above the lanes. The position of DNA size markers (in bp) is indicated on the left. The last lane shows the human chromatin spike sample. (C) Scatter plots showing the relation of mouse Pol III loci scores before and after spike adjustment for the three pairs of samples sonicated for different amounts of time. The Pearson and Spearman correlations before and after spike adjustment were as follows: 97.5_S5 versus 97.5_S10, 0.9927→0.9935 and 0.9678→0.9653; 97.5_S5 versus 97.5_S15, 0.9900→0.9885 and 0.9728→0.9663; and 97.5_S10 versus 97.5_S15, 0.9917→0.9926 and 0.9626→0.9636.
Figure 5.
Figure 5.
Spike adjustment improves similarity between replicates and reveals genuine differences in Pol III occupation. (A,B) Scatter plots showing the relation of Pol III loci scores between the two WT (A) and the two Maf1 KO (B) replicate samples before (orange) and after (black) spike adjustment. The red line corresponds to x = y. (CE) Boxplot representations of the Pol III loci score distributions for the two WT samples (light and dark green, mR1_WT and mR2_WT) and the two Maf1 KO samples (light and dark blue, mR1_KO and mR2_KO). The scores were normalized to total number of tags aligned onto the genome (C) followed by either quantile normalization (D) or spike adjustment (E). (FH) Empirical cumulative frequency distributions functions (ECDFs) of the log scores of the indicated distribution. Samples were normalized to the total number of tags aligned onto the genome (F) followed by either quantile normalization (G) or spike adjustment (H). The Kolmogorov-Smirnov (KS) distance for the two WT (green lines) and the two Maf1 KO (blue lines) samples is shown at the bottom right of each panel. (I,J) Mean difference scatter plots illustrating Pol III occupancy in WT and Maf1 KO livers. Samples were normalized to the total number of tags aligned onto the genome followed by quantile normalization (I), respectively by spike adjustment (J). Scores for WT and KO conditions are the average of the two replicates. Loci with scores showing a significant difference in the WT versus Maf1 KO samples are represented with yellow (P ≤ 0.01) and red (0.01 < P ≤ 0.05) dots.
Figure 6.
Figure 6.
Spike adjustment improves the similarity of two Pol II ChIP-seq replicate experiments. (A) ECDFs of the scores of the indicated distributions. Preliminary scores were computed around the TSS (±250 bp) with the SPP software. The KS distance is shown at the bottom right of each panel. (Dark line) RPB2_90 sample; (light line) RPB2_95 sample. (B) Scatter plots showing the relation between the RPB2_90 and RPB2_95 scores before (orange dots) and after (black dots) spike adjustment. The red line corresponds to x = y.

Similar articles

See all similar articles

Cited by 46 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback