Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 39 (5), e27

EpiChIP: Gene-By-Gene Quantification of Epigenetic Modification Levels

Affiliations

EpiChIP: Gene-By-Gene Quantification of Epigenetic Modification Levels

Daniel Hebenstreit et al. Nucleic Acids Res.

Abstract

The combination of chromatin immunoprecipitation with next-generation sequencing technology (ChIP-seq) is a powerful and increasingly popular method for mapping protein-DNA interactions in a genome-wide fashion. The conventional way of analyzing this data is to identify sequencing peaks along the chromosomes that are significantly higher than the read background. For histone modifications and other epigenetic marks, it is often preferable to find a characteristic region of enrichment in sequencing reads relative to gene annotations. For instance, many histone modifications are typically enriched around transcription start sites. Calculating the optimal window that describes this enrichment allows one to quantify modification levels for each individual gene. Using data sets for the H3K9/14ac histone modification in Th cells and an accompanying IgG control, we present an analysis strategy that alternates between single gene and global data distribution levels and allows a clear distinction between experimental background and signal. Curve fitting permits false discovery rate-based classification of genes as modified versus unmodified. We have developed a software package called EpiChIP that carries out this type of analysis, including integration with and visualization of gene expression data.

Figures

Figure 1.
Figure 1.
ChIP-seq data distribution for the H3K9/14ac histone modification. (A) The cumulative read density for the whole genome is shown from −5 kb to +5 kb relative to TSSs. The H3K9/14ac sample (blue) shows a strong enrichment within the first kb downstream from TSS. The IgG control (black) shows a much weaker enrichment in this region. (B) Kernel density estimates of the distributions of all genes with respect to NLCS values within the window from −400 to +807 bp with respect to TSSs. IgG control (black/top) and H3K9/14ac sample (blue/bottom) are shown on linear (left) and log2 (right) scales. The dotted lines represent the signal distributions of random intergenic regions of the same window size. The shapes of the data distributions suggest that the H3K9/14ac sample consists of two separate distributions, the experimental/biological background (BG) and the actual histone-modification signal (HM). (C) Mathematical modeling of the IgG control-data distribution. The genome-wide distribution of the numbers of sequencing reads within the −400/+807 bp window from TSSs (not XSET processed) are shown as a histogram on linear (left) and log2 (right) scales. Numerical maximum likelihood fits of truncated Poisson (red), normal (cyan), lognormal (purple) and truncated normal distributions (green) are overlaid. Parameters and BICs are given in Supplementary Table S2.
Figure 2.
Figure 2.
(A) Mathematical modeling of the H3K9/14ac sample data. (A) combination of a normal (for BG) and a lognormal distribution (for HM) was fit to the NLCS data (from the −400/+807-bp TSS window, as shown in Figure 1B). The experimental data is shown in blue, the BG curve in orange, the HM curve in purple, the sum of the two latter in red. The fit was based on parameter estimation by expectation maximization. Parameters are given in Supplementary Table S3. Alternative fits are shown in Supplementary Figure S2. The grey lines indicate the thresholds at FDR = 0.01. (B) Expression levels of genes in the BG and HM categories. The expression levels are significantly different (P < 2.2 × 10−16, one-sided Wilcoxon test). (C) Plot of histone modification versus gene expression for each gene. The heatmap represents a 2D-kernel density estimate of ∼15 000 genes.
Figure 3.
Figure 3.
Overview of the analysis strategy.
Figure 4.
Figure 4.
EpiChIP screenshots for analysis examples. Types of histone modifications and analysis windows as indicated (A) H3K9me1, (B) H3K27me3 and (C) H3K36me3.

Similar articles

See all similar articles

Cited by 25 PubMed Central articles

See all "Cited by" articles

References

    1. Solomon MJ, Larsen PL, Varshavsky A. Mapping protein-DNA interactions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell. 1988;53:937–947. - PubMed
    1. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, et al. Genome-wide location and function of DNA binding proteins. Science. 2000;290:2306–2309. - PubMed
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nat. Biotechnol. 2008;26:1135–1145. - PubMed
    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009;10:669–680. - PMC - PubMed
    1. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 2007;4:651–657. - PubMed

Publication types

Feedback