Data exploration, quality control and statistical analysis of ChIP-exo/nexus experiments

Nucleic Acids Res. 2017 Sep 6;45(15):e145. doi: 10.1093/nar/gkx594.

Abstract

ChIP-exo/nexus experiments rely on innovative modifications of the commonly used ChIP-seq protocol for high resolution mapping of transcription factor binding sites. Although many aspects of the ChIP-exo data analysis are similar to those of ChIP-seq, these high throughput experiments pose a number of unique quality control and analysis challenges. We develop a novel statistical quality control pipeline and accompanying R/Bioconductor package, ChIPexoQual, to enable exploration and analysis of ChIP-exo and related experiments. ChIPexoQual evaluates a number of key issues including strand imbalance, library complexity, and signal enrichment of data. Assessment of these features are facilitated through diagnostic plots and summary statistics computed over regions of the genome with varying levels of coverage. We evaluated our QC pipeline with both large collections of public ChIP-exo/nexus data and multiple, new ChIP-exo datasets from Escherichia coli. ChIPexoQual analysis of these datasets resulted in guidelines for using these QC metrics across a wide range of sequencing depths and provided further insights for modelling ChIP-exo data.

Publication types

  • Validation Study

MeSH terms

  • Binding Sites / genetics
  • Chromatin Immunoprecipitation / methods*
  • Chromatin Immunoprecipitation / standards
  • DNA / analysis
  • DNA / metabolism*
  • DNA Ligases / metabolism
  • Data Accuracy*
  • Data Interpretation, Statistical*
  • Datasets as Topic
  • Escherichia coli / genetics
  • Escherichia coli / metabolism
  • Exodeoxyribonucleases / metabolism*
  • High-Throughput Nucleotide Sequencing / methods*
  • High-Throughput Nucleotide Sequencing / standards
  • Oligonucleotide Array Sequence Analysis / methods
  • Oligonucleotide Array Sequence Analysis / standards
  • Protein Binding
  • Quality Control
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards
  • Software
  • Transcription Factors / metabolism

Substances

  • Transcription Factors
  • DNA
  • Exodeoxyribonucleases
  • DNA Ligases