Explainable multi-task learning improves the parallel estimation of polygenic risk scores for many diseases through shared genetic basis

Adrien Badré; Chongle Pan

doi:10.1371/journal.pcbi.1011211

Explainable multi-task learning improves the parallel estimation of polygenic risk scores for many diseases through shared genetic basis

PLoS Comput Biol. 2023 Jul 7;19(7):e1011211. doi: 10.1371/journal.pcbi.1011211. eCollection 2023 Jul.

Authors

Adrien Badré¹, Chongle Pan^{1

2}

Affiliations

¹ School of Computer Science, University of Oklahoma, Norman, Oklahoma, United States of America.
² Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, United States of America.

Abstract

Many complex diseases share common genetic determinants and are comorbid in a population. We hypothesized that the co-occurrences of diseases and their overlapping genetic etiology can be exploited to simultaneously improve multiple diseases' polygenic risk scores (PRS). This hypothesis was tested using a multi-task learning (MTL) approach based on an explainable neural network architecture. We found that parallel estimations of the PRS for 17 prevalent cancers in a pan-cancer MTL model were generally more accurate than independent estimations for individual cancers in comparable single-task learning (STL) models. Such performance improvement conferred by positive transfer learning was also observed consistently for 60 prevalent non-cancer diseases in a pan-disease MTL model. Interpretation of the MTL models revealed significant genetic correlations between the important sets of single nucleotide polymorphisms used by the neural network for PRS estimation. This suggested a well-connected network of diseases with shared genetic basis.

Copyright: © 2023 Badré, Pan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Genetic Predisposition to Disease / genetics
Humans
Learning*
Multifactorial Inheritance / genetics
Neural Networks, Computer*
Polymorphism, Single Nucleotide / genetics
Risk Factors

Grants and funding

R01 AT011618/AT/NCCIH NIH HHS/United States