AIAP: A Quality Control and Integrative Analysis Package to Improve ATAC-seq Data Analysis

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):641-651. doi: 10.1016/j.gpb.2020.06.025. Epub 2021 Jul 15.

Abstract

Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) is a technique widely used to investigate genome-wide chromatin accessibility. The recently published Omni-ATAC-seq protocol substantially improves the signal/noise ratio and reduces the input cell number. High-quality data are critical to ensure accurate analysis. Several tools have been developed for assessing sequencing quality and insertion size distribution for ATAC-seq data; however, key quality control (QC) metrics have not yet been established to accurately determine the quality of ATAC-seq data. Here, we optimized the analysis strategy for ATAC-seq and defined a series of QC metrics for ATAC-seq data, including reads under peak ratio (RUPr), background (BG), promoter enrichment (ProEn), subsampling enrichment (SubEn), and other measurements. We incorporated these QC tests into our recently developed ATAC-seq Integrative Analysis Package (AIAP) to provide a complete ATAC-seq analysis system, including quality assurance, improved peak calling, and downstream differential analysis. We demonstrated a significant improvement of sensitivity (20%-60%) in both peak calling and differential analysis by processing paired-end ATAC-seq datasets using AIAP. AIAP is compiled into Docker/Singularity, and it can be executed by one command line to generate a comprehensive QC report. We used ENCODE ATAC-seq data to benchmark and generate QC recommendations, and developed qATACViewer for the user-friendly interaction with the QC report. The software, source code, and documentation of AIAP are freely available at https://github.com/Zhang-lab/ATAC-seq_QC_analysis.

Keywords: ATAC-seq; Chromatin accessibility; Data visualization; Differential analysis; Quality control.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Chromatin / genetics
  • Chromatin Immunoprecipitation Sequencing*
  • Data Analysis*
  • High-Throughput Nucleotide Sequencing / methods
  • Quality Control
  • Sequence Analysis, DNA / methods

Substances

  • Chromatin