ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements

PLoS Comput Biol. 2021 Jul 22;17(7):e1009203. doi: 10.1371/journal.pcbi.1009203. eCollection 2021 Jul.

Abstract

Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Binding Sites / genetics
  • Chromatin / genetics
  • Chromatin / metabolism
  • Chromatin Immunoprecipitation Sequencing / methods*
  • Chromatin Immunoprecipitation Sequencing / statistics & numerical data
  • Computational Biology
  • Enhancer Elements, Genetic
  • Epigenesis, Genetic
  • Gene Expression Regulation
  • Gene Regulatory Networks*
  • Humans
  • K562 Cells
  • MCF-7 Cells
  • Models, Statistical
  • Promoter Regions, Genetic
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism*

Substances

  • Chromatin
  • Transcription Factors