Predicting enhancer-gene links from single-cell multi-omics data by integrating prior Hi-C information

Nucleic Acids Res. 2026 May 5;54(9):gkag437. doi: 10.1093/nar/gkag437.

Abstract

Enhancers play an important role in transcriptional regulation by modulating gene expression from distal genomic locations. Although single-cell ATAC and RNA sequencing (scATAC/RNA-seq) data have been leveraged to infer enhancer-gene links, establishing regulatory links between enhancers and their target genes remains a challenge due to the absence of chromatin conformation information. Here, we present SCEG-HiC, a machine learning method based on weighted graphical lasso, which decodes enhancer-gene links from single-cell multi-omics data by integrating bulk average Hi-C as prior knowledge. SCEG-HiC supports both paired scATAC/RNA-seq and scATAC-only inputs, improving prediction accuracy while retaining context-specific correlations and enabling the discovery of biologically relevant links. Comprehensive evaluation across 10 human and mouse single-cell multi-omics datasets shows that SCEG-HiC outperforms existing single-cell models. Application of SCEG-HiC to COVID-19 datasets illustrates its capacity to more reliably reconstruct gene regulatory networks underlying disease severity, and elucidate functional associations between noncoding variants and their putative target genes. SCEG-HiC is freely available as an open-source and user-friendly R package, facilitating broad applications in regulatory genomics research.

MeSH terms

  • Animals
  • COVID-19 / genetics
  • COVID-19 / virology
  • Chromatin / genetics
  • Chromatin Immunoprecipitation Sequencing
  • Enhancer Elements, Genetic*
  • Gene Expression Regulation
  • Gene Regulatory Networks
  • Genomics / methods
  • Humans
  • Machine Learning
  • Mice
  • Multiomics
  • RNA-Seq
  • SARS-CoV-2 / genetics
  • Sequence Analysis, RNA
  • Single-Cell Analysis* / methods

Substances

  • Chromatin