High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
- PMID: 36055201
- PMCID: PMC9439720
- DOI: 10.1016/j.cell.2022.08.004
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Keywords: 1000 Genomes Project; INDEL; SNV; population genetics; reference imputation panel; structural variation; trio sequencing; whole-genome sequencing.
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of interests E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. P.F. is an SAB member of Fabric Genomics, Inc., and Eagle Genomics, Ltd.
Figures
Comment in
-
1000 Genomes Project phase 4: The gift that keeps on giving.Cell. 2022 Sep 1;185(18):3286-3289. doi: 10.1016/j.cell.2022.08.001. Cell. 2022. PMID: 36055197 Clinical Trial.
Similar articles
-
Deep whole-genome sequencing of 90 Han Chinese genomes.Gigascience. 2017 Sep 1;6(9):1-7. doi: 10.1093/gigascience/gix067. Gigascience. 2017. PMID: 28938720 Free PMC article.
-
GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing.Nucleic Acids Res. 2022 Mar 21;50(5):2464-2479. doi: 10.1093/nar/gkac076. Nucleic Acids Res. 2022. PMID: 35176773 Free PMC article.
-
KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses.Sci Rep. 2018 Apr 4;8(1):5677. doi: 10.1038/s41598-018-23837-x. Sci Rep. 2018. PMID: 29618732 Free PMC article.
-
Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing.Mol Genet Metab. 2013 Sep-Oct;110(1-2):65-72. doi: 10.1016/j.ymgme.2013.06.004. Epub 2013 Jun 13. Mol Genet Metab. 2013. PMID: 23830478
-
Added Value of Reanalysis of Whole Exome- and Whole Genome Sequencing Data From Patients Suspected of Primary Immune Deficiency Using an Extended Gene Panel and Structural Variation Calling.Front Immunol. 2022 Jun 30;13:906328. doi: 10.3389/fimmu.2022.906328. eCollection 2022. Front Immunol. 2022. PMID: 35874679 Free PMC article. Review.
Cited by
-
Genetic structure correlates with ethnolinguistic diversity in eastern and southern Africa.Am J Hum Genet. 2022 Sep 1;109(9):1667-1679. doi: 10.1016/j.ajhg.2022.07.013. Am J Hum Genet. 2022. PMID: 36055213 Free PMC article.
-
A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders.Nat Commun. 2024 Sep 27;15(1):8268. doi: 10.1038/s41467-024-52463-7. Nat Commun. 2024. PMID: 39333082 Free PMC article.
-
Joint multi-ancestry and admixed GWAS reveals the complex genetics behind human cranial vault shape.Nat Commun. 2023 Nov 16;14(1):7436. doi: 10.1038/s41467-023-43237-8. Nat Commun. 2023. PMID: 37973980 Free PMC article. Review.
-
Metabolic Characteristics of Gut Microbiota and Insomnia: Evidence from a Mendelian Randomization Analysis.Nutrients. 2024 Sep 2;16(17):2943. doi: 10.3390/nu16172943. Nutrients. 2024. PMID: 39275260 Free PMC article.
-
Functional analysis of structural variants in single cells using Strand-seq.Nat Biotechnol. 2023 Jun;41(6):832-844. doi: 10.1038/s41587-022-01551-4. Epub 2022 Nov 24. Nat Biotechnol. 2023. PMID: 36424487 Free PMC article.
References
-
- Almeida R., Ricaño-Ponce I., Kumar V., Deelen P., Szperl A., Trynka G., Gutierrez-Achury J., Kanterakis A., Westra H.-J., Franke L., et al. Fine mapping of the celiac disease-associated LPP locus reveals a potential functional variant. Hum. Mol. Genet. 2014;23:2481–2489. doi: 10.1093/hmg/ddt619. - DOI - PMC - PubMed
-
- Andrews S. FastQC. 2019. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
-
- Broad Institute Picard Toolkit, Github Repository. 2019. http://broadinstitute.github.io/picard/
Publication types
MeSH terms
Grants and funding
- R01 HG002898/HG/NHGRI NIH HHS/United States
- R03 HD099547/HD/NICHD NIH HHS/United States
- R35 GM138212/GM/NIGMS NIH HHS/United States
- UM1 HG008895/HG/NHGRI NIH HHS/United States
- UM1 HG008901/HG/NHGRI NIH HHS/United States
- R01 HD081256/HD/NICHD NIH HHS/United States
- R56 MH115957/MH/NIMH NIH HHS/United States
- WT_/Wellcome Trust/United Kingdom
- U24 HG007497/HG/NHGRI NIH HHS/United States
- R21 CA259309/CA/NCI NIH HHS/United States
- R01 MH115957/MH/NIMH NIH HHS/United States
- R01 CA261934/CA/NCI NIH HHS/United States
- UM1 HG008853/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
