Learning Micro-C from Hi-C with diffusion models

PLoS Comput Biol. 2024 May 17;20(5):e1012136. doi: 10.1371/journal.pcbi.1012136. eCollection 2024 May.

Abstract

In the last few years, Micro-C has shown itself as an improved alternative to Hi-C. It replaced the restriction enzymes in Hi-C assays with micrococcal nuclease (MNase), resulting in capturing nucleosome resolution chromatin interactions. The signal-to-noise improvement of Micro-C allows it to detect more chromatin loops than high-resolution Hi-C. However, compared with massive Hi-C datasets available in the literature, there are only a limited number of Micro-C datasets. To take full advantage of these Hi-C datasets, we present HiC2MicroC, a computational method learning and then predicting Micro-C from Hi-C based on the denoising diffusion probabilistic models (DDPM). We trained our DDPM and other regression models in human foreskin fibroblast (HFFc6) cell line and evaluated these methods in six different cell types at 5-kb and 1-kb resolution. Our evaluations demonstrate that both HiC2MicroC and regression methods can markedly improve Hi-C towards Micro-C, and our DDPM-based HiC2MicroC outperforms regression in various terms. First, HiC2MicroC successfully recovers most of the Micro-C loops even those not detected in Hi-C maps. Second, a majority of the HiC2MicroC-recovered loops anchor CTCF binding sites in a convergent orientation. Third, HiC2MicroC loops share genomic and epigenetic properties with Micro-C loops, including linking promoters and enhancers, and their anchors are enriched for structural proteins (CTCF and cohesin) and histone modifications. Lastly, we find our recovered loops are also consistent with the loops identified from promoter capture Micro-C (PCMicro-C) and Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET). Overall, HiC2MicroC is an effective tool for further studying Hi-C data with Micro-C as a template. HiC2MicroC is publicly available at https://github.com/zwang-bioinformatics/HiC2MicroC/.

MeSH terms

  • CCCTC-Binding Factor / genetics
  • CCCTC-Binding Factor / metabolism
  • Cell Line
  • Chromatin* / chemistry
  • Chromatin* / genetics
  • Chromatin* / metabolism
  • Computational Biology* / methods
  • Humans
  • Models, Statistical

Substances

  • Chromatin
  • CCCTC-Binding Factor

Grants and funding

This work was supported by the National Institutes of Health grant [1R35GM137974 to ZW]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.