A two-stage GAN-based instrumental variable method for causal analysis of omics data

Brief Bioinform. 2026 Jan 7;27(1):bbag071. doi: 10.1093/bib/bbag071.

Abstract

While considerable progress has been made in identifying candidate genes associated with complex diseases, their potential causal roles in disease etiology remain unknown. Mendelian randomization (MR) utilizes genetic variants as instrumental variables (IVs) to estimate the causal effects of disease-associated genes, thereby establishing putative causal associations and reducing spurious association findings due to confounding. To mitigate the potential bias due to the violation of IV conditions and nonlinear exposure-outcome relations in MR studies, we propose a two-stage deep learning framework, which is free from distribution assumptions of exposure given IVs and flexible to capture complex exposure-outcome relations. Specifically, we adapt the generative adversarial networks (GAN) to estimate the conditional distribution of gene expression given IVs in the first stage and apply deep functional neural networks to learn the causal relationships between gene expression and outcomes. Moreover, the proposed method is flexible to handle various data types, such as multiple gene expressions and multi-omics data. Through simulation studies under different distributions and model choices, our proposed GAN-based instrumental variable (GAN-IV) method demonstrates improved performance over the two-stage least squares method, pleiotropy-robust MR methods (e.g. MR-LINK), and state-of-the-art deep-learning-based methods (e.g. DeLIVR). A real data application on the ROSMAP dataset further illustrates that GAN-IV is capable of capturing the exposure distribution and complex nonlinear causal effect between gene expression and disease phenotype. Overall, the proposed GAN-IV framework provides a powerful and distribution-free tool for complex omics data, and accounts for unobserved pleiotropy and linkage disequilibrium.

Keywords: deep functional neural networks; exposure distribution; generative adversarial networks; nonlinear causal effects.

MeSH terms

  • Algorithms
  • Computational Biology* / methods
  • Deep Learning*
  • Genomics* / methods
  • Humans
  • Mendelian Randomization Analysis* / methods
  • Neural Networks, Computer*