Deep Learning of Sequence Patterns for CCCTC-Binding Factor-Mediated Chromatin Loop Formation

J Comput Biol. 2021 Feb;28(2):133-145. doi: 10.1089/cmb.2020.0225. Epub 2020 Nov 25.

Abstract

The three-dimensional (3D) organization of the human genome is of crucial importance for gene regulation, and the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions. However, it is still unclear what sequence patterns in addition to CTCF motif pairs determine chromatin loop formation. To discover the underlying sequence patterns, we have developed a deep learning model, called DeepCTCFLoop, to predict whether a chromatin loop can be formed between a pair of convergent or tandem CTCF motifs using only the DNA sequences of the motifs and their flanking regions. Our results suggest that DeepCTCFLoop can accurately distinguish the CTCF motif pairs forming chromatin loops from the ones not forming loops. It significantly outperforms CTCF-MP, a machine learning model based on word2vec and boosted trees, when using DNA sequences only. Furthermore, we show that DNA motifs binding to several transcription factors, including ZNF384, ZNF263, ASCL1, SP1, and ZEB1, may constitute the complex sequence patterns for CTCF-mediated chromatin loop formation. DeepCTCFLoop has also been applied to disease-associated sequence variants to identify candidates that may disrupt chromatin loop formation. Therefore, our results provide useful information for understanding the mechanism of 3D genome organization and may also help annotate and prioritize the noncoding sequence variants associated with human diseases.

Keywords: 3D genome; CTCF; chromatin loops; deep learning; sequence motifs.

MeSH terms

  • Binding Sites
  • CCCTC-Binding Factor / chemistry
  • CCCTC-Binding Factor / metabolism*
  • Cell Line
  • Chromatin / genetics*
  • Chromatin / metabolism
  • Computational Biology / methods*
  • DNA / chemistry*
  • DNA / metabolism*
  • Deep Learning
  • Genetic Predisposition to Disease
  • HeLa Cells
  • Humans
  • K562 Cells
  • Nucleotide Motifs
  • Sequence Analysis, DNA
  • Transcription Factors / chemistry
  • Transcription Factors / metabolism

Substances

  • CCCTC-Binding Factor
  • CTCF protein, human
  • Chromatin
  • Transcription Factors
  • DNA