A proximity-based graph clustering method for the identification and application of transcription factor clusters

BMC Bioinformatics. 2017 Nov 29;18(1):530. doi: 10.1186/s12859-017-1935-y.

Abstract

Background: Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions.

Methods: Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF.

Results: We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions.

Conclusion: The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.

Keywords: Genome regulation; Graph clustering; Graph theory; Network analysis; TF clusters; Transcription factors.

MeSH terms

  • Algorithms*
  • Chromatin Immunoprecipitation
  • Cluster Analysis
  • DNA / chemistry
  • DNA / isolation & purification
  • DNA / metabolism
  • Gene Regulatory Networks
  • Humans
  • K562 Cells
  • Markov Chains
  • Protein Interaction Maps / genetics
  • Sequence Analysis, DNA
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors
  • DNA