Genetic analyses of eight complex diseases using predicted continuous representations of disease

Cell Rep Methods. 2025 Aug 18;5(8):101115. doi: 10.1016/j.crmeth.2025.101115. Epub 2025 Jul 25.

Abstract

We evaluated whether predicted continuous disease representations could enhance genetic discovery beyond case-control genome-wide association study (GWAS) phenotypes across eight complex diseases in up to 485,448 UK Biobank participants. Predicted phenotypes had high genetic correlations with case-control phenotypes (median rg = 0.66) but identified more independent associations (median 306 versus 125). While some predicted phenotype associations were spurious, multi-trait analysis of GWAS-boosted case-control phenotypes identified a median of 46 additional variants per disease, of which a median of 73% replicated in FinnGen, 37% reached genome-wide significance in a UK Biobank/FinnGen meta-analysis, and 45% had supporting evidence. Predicted phenotypes also identified 14 genes targeted by phase I-IV drugs not identified by case-control phenotypes, and combined polygenic risk scores (PRSs) using both phenotypes improved prediction performance, with a median 37% increase in Nagelkerke's R2. Predicted phenotypes represent composite biomarkers complementing case-control approaches in genetic discovery, drug target prioritization, and risk prediction, though efficacy varies across diseases.

Keywords: CP: computational biology; CP: genetics; electronic health records; genome-wide association study; machine learning.

MeSH terms

  • Case-Control Studies
  • Female
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study* / methods
  • Humans
  • Male
  • Multifactorial Inheritance / genetics
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics