Detecting and correcting the binding-affinity bias in ChIP-seq data using inter-species information

BMC Genomics. 2016 May 10;17:347. doi: 10.1186/s12864-016-2682-6.

Abstract

Background: Transcriptional gene regulation is a fundamental process in nature, and the experimental and computational investigation of DNA binding motifs and their binding sites is a prerequisite for elucidating this process. ChIP-seq has become the major technology to uncover genomic regions containing those binding sites, but motifs predicted by traditional computational approaches using these data are distorted by a ubiquitous binding-affinity bias. Here, we present an approach for detecting and correcting this bias using inter-species information.

Results: We find that the binding-affinity bias caused by the ChIP-seq experiment in the reference species is stronger than the indirect binding-affinity bias in orthologous regions from phylogenetically related species. We use this difference to develop a phylogenetic footprinting model that is capable of detecting and correcting the binding-affinity bias. We find that this model improves motif prediction and that the corrected motifs are typically softer than those predicted by traditional approaches.

Conclusions: These findings indicate that motifs published in databases and in the literature are artificially sharpened compared to the native motifs. These findings also indicate that our current understanding of transcriptional gene regulation might be blurred, but that it is possible to advance this understanding by taking into account inter-species information available today and even more in the future.

Keywords: Binding-affinity bias; ChIP-seq; Evolution; Gene regulation; Phylogenetic footprinting; Transcription factor binding sites.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites*
  • Chromatin Immunoprecipitation*
  • Computational Biology / methods
  • Gene Expression Regulation
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Models, Genetic
  • Nucleotide Motifs*
  • Reproducibility of Results
  • Transcription Factors* / metabolism

Substances

  • Transcription Factors