TVAR: assessing tissue-specific functional effects of non-coding variants with deep learning

Bioinformatics. 2022 Oct 14;38(20):4697-4704. doi: 10.1093/bioinformatics/btac608.

Abstract

Motivation: Analysis of whole-genome sequencing (WGS) for genetics is still a challenge due to the lack of accurate functional annotation of non-coding variants, especially the rare ones. As eQTLs have been extensively implicated in the genetics of human diseases, we hypothesize that rare non-coding variants discovered in WGS play a regulatory role in predisposing disease risk.

Results: With thousands of tissue- and cell-type-specific epigenomic features, we propose TVAR. This multi-label learning-based deep neural network predicts the functionality of non-coding variants in the genome based on eQTLs across 49 human tissues in the GTEx project. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to understand shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average AUROC of 0.77 across these tissues. We evaluate TVAR's performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes and Schizophrenia), using TVAR's tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared with five existing state-of-the-art tools. We further evaluate TVAR's G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants and observe the consistently better performance of TVAR compared with other competing tools.

Availability and implementation: The TVAR source code and its scores on the ClinVar catalog, fine mapped GWAS Loci, high confidence eQTLs from GTEx dataset, and MPRA validated functional variants are available at https://github.com/haiyang1986/TVAR.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Diabetes Mellitus, Type 2* / genetics
  • Genome-Wide Association Study / methods
  • Humans
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci
  • Software