Leveraging genetic ancestry continuum information to interpolate PRS for admixed populations

medRxiv [Preprint]. 2025 Jan 14:2024.11.09.24316996. doi: 10.1101/2024.11.09.24316996.

Abstract

The relatively low representation of admixed populations in both discovery and fine-tuning individual-level datasets limits polygenic risk score (PRS) development and equitable clinical translation for admixed populations. Under the assumption that the most informative PRS weight for a homogeneous sample varies linearly in an ancestry continuum space, we introduce a Genetic Distance-assisted PRS Combination Pipeline for Diverse Genetic Ancestries (DiscoDivas) to interpolate a harmonized PRS for diverse, especially admixed, ancestries, leveraging multiple PRS weights fine-tuned within single-ancestry samples and genetic distance. DiscoDivas treats ancestry as a continuous variable and does not require shifting between different models when calculating PRS for different ancestries. We generated PRS with DiscoDivas and the current conventional method, i.e. fine-tuning multiple GWAS PRS using the matched or similar ancestry samples. DiscoDivas generated a harmonized PRS of the accuracy comparable to or higher than the conventional approach, with the greatest advantage exhibited in admixed individuals.

Publication types

  • Preprint