Graph attention network for link prediction of gene regulations from single cell RNA-sequencing data

Bioinformatics. 2022 Aug 12;btac559. doi: 10.1093/bioinformatics/btac559. Online ahead of print.


Motivation: Single-cell RNA sequencing (scRNA-seq) data provides unprecedented opportunities to reconstruct gene regulatory networks (GRNs) at fine-grained resolution. Numerous unsupervised or self-supervised models have been proposed to infer GRN from bulk RNA-seq data, but few of them are appropriate for scRNA-seq data under the circumstance of low signal-to-noise ratio and dropout. Fortunately, the surging of TF-DNA binding data (e.g., ChIP-seq) makes supervised GRN inference possible. We regard supervised GRN inference as a graph-based link prediction problem that expects to learn gene low-dimensional vectorized representations to predict potential regulatory interactions.

Results: In this paper, we present GENELink to infer latent interactions between transcription factors (TFs) and target genes in GRN using graph attention network. GENELink projects the single-cell gene expression with observed TF-gene pairs to a low-dimensional space. Then, the specific gene representations are learned to serve for downstream similarity measurement or causal inference of pairwise genes by optimizing the embedding space. Compared to eight existing GRN reconstruction methods, GENELink achieves comparable or better performance on seven scRNA-seq datasets with four types of ground-truth networks. We further apply GENELink on scRNA-seq of human breast cancer metastasis and reveal regulatory heterogeneity of Notch and Wnt signaling pathways between primary tumour and lung metastasis. Moreover, the ontology enrichment results of unique lung metastasis GRN indicate that mitochondrial oxidative phosphorylation (OXPHOS) is functionally important during the seeding step of the cancer metastatic cascade, which is validated by pharmacological assays.

Availability and implementation: The code and data are available at

Supplementary information: Supplementary data are available at Bioinformatics online.