Deep neural network improves the estimation of polygenic risk scores for breast cancer

J Hum Genet. 2021 Apr;66(4):359-369. doi: 10.1038/s10038-020-00832-7. Epub 2020 Oct 2.

Abstract

Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / genetics*
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology*
  • Case-Control Studies
  • Female
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study
  • Humans
  • Multifactorial Inheritance*
  • Neural Networks, Computer*
  • Phenotype
  • Polymorphism, Single Nucleotide*
  • ROC Curve
  • Risk Factors

Substances

  • Biomarkers, Tumor