Haplotype and population structure inference using neural networks in whole-genome sequencing data
- PMID: 35794006
- PMCID: PMC9435741
- DOI: 10.1101/gr.276813.122
Haplotype and population structure inference using neural networks in whole-genome sequencing data
Abstract
Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
© 2022 Meisner and Albrechtsen; Published by Cold Spring Harbor Laboratory Press.
Figures
Similar articles
-
De novo inference of stratification and local admixture in sequencing studies.BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10. BMC Bioinformatics. 2013. PMID: 23734678 Free PMC article.
-
Improving population scale statistical phasing with whole-genome sequencing data.PLoS Genet. 2024 Jul 3;20(7):e1011092. doi: 10.1371/journal.pgen.1011092. eCollection 2024 Jul. PLoS Genet. 2024. PMID: 38959269 Free PMC article.
-
Modeling Human Population Separation History Using Physically Phased Genomes.Genetics. 2017 Jan;205(1):385-395. doi: 10.1534/genetics.116.192963. Epub 2016 Nov 9. Genetics. 2017. PMID: 28049708 Free PMC article.
-
Hybrid autoencoder with orthogonal latent space for robust population structure inference.Sci Rep. 2023 Feb 14;13(1):2612. doi: 10.1038/s41598-023-28759-x. Sci Rep. 2023. PMID: 36788253 Free PMC article.
-
Haplotype analysis in population genetics and association studies.Pharmacogenomics. 2003 Mar;4(2):171-8. doi: 10.1517/phgs.4.2.171.22636. Pharmacogenomics. 2003. PMID: 12605551 Review.
Cited by
-
The genomic footprint of social stratification in admixing American populations.Elife. 2023 Dec 1;12:e84429. doi: 10.7554/eLife.84429. Elife. 2023. PMID: 38038347 Free PMC article.
-
dnadna: a deep learning framework for population genetics inference.Bioinformatics. 2023 Jan 1;39(1):btac765. doi: 10.1093/bioinformatics/btac765. Bioinformatics. 2023. PMID: 36445000 Free PMC article.
-
Inference of Coalescence Times and Variant Ages Using Convolutional Neural Networks.Mol Biol Evol. 2023 Oct 4;40(10):msad211. doi: 10.1093/molbev/msad211. Mol Biol Evol. 2023. PMID: 37738175 Free PMC article.
-
Harnessing deep learning for population genetic inference.Nat Rev Genet. 2024 Jan;25(1):61-78. doi: 10.1038/s41576-023-00636-3. Epub 2023 Sep 4. Nat Rev Genet. 2024. PMID: 37666948 Review.
-
Quantitative evaluation of nonlinear methods for population structure visualization and inference.G3 (Bethesda). 2022 Aug 25;12(9):jkac191. doi: 10.1093/g3journal/jkac191. G3 (Bethesda). 2022. PMID: 35900169 Free PMC article.
References
-
- Baldi P. 2012. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning. PMLR27: 37–49.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources