Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 22;13(1):22.
doi: 10.1186/s13072-020-00342-y.

ATAC-seq Normalization Method Can Significantly Affect Differential Accessibility Analysis and Interpretation

Affiliations
Free PMC article

ATAC-seq Normalization Method Can Significantly Affect Differential Accessibility Analysis and Interpretation

Jake J Reske et al. Epigenetics Chromatin. .
Free PMC article

Abstract

Background: Chromatin dysregulation is associated with developmental disorders and cancer. Numerous methods for measuring genome-wide chromatin accessibility have been developed in the genomic era to interrogate the function of chromatin regulators. A recent technique which has gained widespread use due to speed and low input requirements with native chromatin is the Assay for Transposase-Accessible Chromatin, or ATAC-seq. Biologists have since used this method to compare chromatin accessibility between two cellular conditions. However, approaches for calculating differential accessibility can yield conflicting results, and little emphasis is placed on choice of normalization method during differential ATAC-seq analysis, especially when global chromatin alterations might be expected.

Results: Using an in vivo ATAC-seq data set generated in our recent report, we observed differences in chromatin accessibility patterns depending on the data normalization method used to calculate differential accessibility. This observation was further verified on published ATAC-seq data from yeast. We propose a generalized workflow for differential accessibility analysis using ATAC-seq data. We further show this workflow identifies sites of differential chromatin accessibility that correlate with gene expression and is sensitive to differential analysis using negative controls.

Conclusions: We argue that researchers should systematically compare multiple normalization methods before continuing with differential accessibility analysis. ATAC-seq users should be aware of the interpretations of potential bias within experimental data and the assumptions of the normalization method implemented.

Keywords: ATAC-seq; Bioinformatics; Chromatin accessibility; Differential accessibility; Genomics; Normalization.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
DA distributions from the same ATAC-seq data set analyzed by 8 different DA approaches. Example MA plots for ATAC-enriched regions of interest analyzed for differential accessibility by different approaches. I and II are from DiffBind using MACS2 peak sets and with scaling factors derived from full libraries or reads in peaks only, respectively. III and IV are from csaw using MACS2 peak sets as query regions with either a TMM or non-linear loess-based normalization method. Likewise, V and VI are from csaw, but instead using de novo query regions identified through local neighborhood enrichment. VII was calculated using MACS2 peak sets transformed to log2 counts per million (log2CPM) by voom which is further quantile normalized in VIII. MA plot X-axis represents average ATAC signal abundance at that region, while Y-axis is the log2 difference in ATAC signal between the two conditions. Black dots represent non-significant regions, and red dots represent significant (FDR < 0.10) DA regions. Blue lines are loess fits to each distribution with 95% confidence intervals shaded in gray
Fig. 2
Fig. 2
Output comparison of approaches for computing differential accessibility. a Output comparison of 8 approaches described in Fig. 1 for calling significant DA regions in ATAC-seq data, separated by increasing vs. decreasing accessibility regions. b Comparison of same 8 approaches divided by significant DA promoter regions (within 3 kb of a TSS) vs. distal (further than 3 kb of a TSS). c Comparison of significant DA promoter regions in all 8 approaches segregated by increasing vs. decreasing accessibility. d Quantification of overlapping genes associated with a significant DA promoter region between all 8 approaches. e Gene set enrichment of Hallmark MSigDB pathways among genes with DA promoters for all 8 approaches. Enrichment displayed as observed/expected ratio, where red values indicate pathway overrepresentation
Fig. 3
Fig. 3
Comprehensive DA analysis and gene expression comparisons of yeast osmotic time-course series. Time series analysis of the Schep et al. osmotic stress in yeast ATAC-seq data set with all 8 DA approaches. MA plots are shown for 15-min exposure vs. 0-min controls and exemplary of global effects of data normalization. Time-course line plots depict the mean change in accessibility at each time point compared to control samples, for all gene promoter ATAC regions defined by respective gene expression changes. Gene expression changes following the same 0.6 M NaCl treatment reported by Ni et al. are defined as stable expression (gray line), upregulated expression (red line), and downregulated expression (blue line). See Additional file 1: Figure S4 for complete data and statistical analysis of time-course series with all 8 approaches
Fig. 4
Fig. 4
Generalized ATAC-seq data processing workflow intended for comparative analysis. Stepwise bioinformatics process and example commands for analyzing ATAC-seq data from raw reads to calling peaks for downstream differential accessibility analysis. Consider “treat1” as an example mouse ATAC-seq Illumina paired-end library. Blue text denotes optional or conditional steps dependent on experimental design and desired output. Users seeking only to discover replicate-concordant accessible regions in a singular cell state may wish to call naïve overlapping peaks, though this step is not necessary for differential accessibility analysis. Bash scripts for Tn5 coordinate shift (bedpeTn5shift.sh), minimal BEDPE format conversion (bedpeMinimalConvert.sh), and calling naïve overlap broad peaks (naiveOverlapBroad.sh) are located in the additional files section along with a machine-readable text version of this workflow
Fig. 5
Fig. 5
Conservative and relevant peak calling by proposed framework exemplified on Buenrostro et al. data. a Overlap of MACS2 broad peaks called with proposed workflow between independent GM12878 ATAC-seq replicates from Buenrostro et al. Naïve overlap identifies 99.8% of fully replicate-intersecting peaks. b Genome-wide overlap of naïve overlap peak set generated herein compared to ZINBA peak set reported by Buenrostro et al. c Overlap of genes with detected ATAC promoters identified in the two peak sets as in b. d Overlap of expression-measured genes with detected ATAC promoters in the two peak sets compared to all measured genes. GM12878 expression data was pulled from a microarray data set generated by Ernst et al. e Microarray log2 expression levels (RMA) of genes segregated by promoter ATAC peak status detected between the two peak sets. Genes were binned as having a detected peak in both sets, only by naïve overlap herein, only by Buenrostro et al. ZINBA, or neither. Statistic is unpaired, two-tailed Wilcoxon test. f Correlation of promoter ATAC peak signal and gene expression for 5508 genes with a detected promoter ATAC peak in both peak sets. ATAC signal is quantified by reads in peak (log10 scale; linear values displayed on axis for clarity), and the strongest value was selected to represent promoters with multiple peaks. Correlation statistics displayed are Pearson and Spearman. Overlaid linear fit is displayed in red and loess in blue. Fisher Z-transformation was used to compare correlation coefficients between both peak sets. g Example ATAC-seq signal tracks showing peaks called (black bars) at different loci between the two peak sets. All three replicates are overlaid with darker colors representing overlapping replicates. Y-axis is log likelihood ratio of peak signal
Fig. 6
Fig. 6
csaw workflow for multiple differential accessibility analyses in R. Graphical representation of proposed csaw workflow in R for calculating differential accessibility. Consider an experimental design with n = 2 biological replicates from two conditions: “treat” and “control”

Similar articles

See all similar articles

References

    1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
    1. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010;11:191–203. doi: 10.1038/nrg2732. - DOI - PubMed
    1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10:669–680. doi: 10.1038/nrg2641. - DOI - PMC - PubMed
    1. O’Geen H, Echipare L, Farnham PJ. Using ChIP-seq technology to generate high-resolution profiles of histone modifications. Methods Mol Biol. 2011;791:265–286. doi: 10.1007/978-1-61779-316-5_20. - DOI - PMC - PubMed
    1. Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. - DOI - PMC - PubMed

LinkOut - more resources

Feedback