Motivation: Single-cell multi-omics technologies enable the simultaneous profiling of gene expression and chromatin accessibility, providing complementary insights into cellular identity and gene regulatory mechanisms. However, integrating paired scRNA-seq and scATAC-seq data (i.e. profiles from the same single cell) remains challenging due to inherent sparsity, technical noise, and the limited availability of high-quality paired measurements. In contrast, large-scale unpaired scRNA-seq datasets often exhibit robust and biologically meaningful cell cluster structures.
Results: We introduce Guided Co-clustering Transfer (GuidedCoC), a novel framework that transfers structural knowledge from unpaired scRNA-seq source data to improve both cell clustering and feature alignment in paired scRNA-seq/scATAC-seq target data. GuidedCoC jointly co-cluster cells and features across modalities and domains via a unified information-theoretic objective, aligning gene expression modules with regulatory elements while implicitly performing cross-modal dimensionality reduction to reduce noise. Additionally, it automatically aligns cell populations across unpaired and paired datasets without requiring explicit annotations. Extensive experiments on multiple benchmark datasets demonstrate that GuidedCoC achieves superior clustering accuracy and biological interpretability compared to existing methods. These results highlight the promise of structure-guided transfer learning for robust, scalable, and interpretable integration of single-cell multi-omics data.
Availability and implementation: GuidedCoC is available as open-source code at https://github.com/No-AgCl/GuidedCoC.
© The Author(s) 2025. Published by Oxford University Press.