DeepPerVar: a multi-modal deep learning framework for functional interpretation of genetic variants in personal genome

Bioinformatics. 2022 Dec 13;38(24):5340-5351. doi: 10.1093/bioinformatics/btac696.


Motivation: Understanding the functional consequence of genetic variants, especially the non-coding ones, is important but particularly challenging. Genome-wide association studies (GWAS) or quantitative trait locus analyses may be subject to limited statistical power and linkage disequilibrium, and thus are less optimal to pinpoint the causal variants. Moreover, most existing machine-learning approaches, which exploit the functional annotations to interpret and prioritize putative causal variants, cannot accommodate the heterogeneity of personal genetic variations and traits in a population study, targeting a specific disease.

Results: By leveraging paired whole-genome sequencing data and epigenetic functional assays in a population study, we propose a multi-modal deep learning framework to predict genome-wide quantitative epigenetic signals by considering both personal genetic variations and traits. The proposed approach can further evaluate the functional consequence of non-coding variants on an individual level by quantifying the allelic difference of predicted epigenetic signals. By applying the approach to the ROSMAP cohort studying Alzheimer's disease (AD), we demonstrate that the proposed approach can accurately predict quantitative genome-wide epigenetic signals and in key genomic regions of AD causal genes, learn canonical motifs reported to regulate gene expression of AD causal genes, improve the partitioning heritability analysis and prioritize putative causal variants in a GWAS risk locus. Finally, we release the proposed deep learning model as a stand-alone Python toolkit and a web server.

Availability and implementation:

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Genome-Wide Association Study*
  • Genomics
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci