Accurate, scalable cohort variant calls using DeepVariant and GLnexus
- PMID: 33399819
- PMCID: PMC8023681
- DOI: 10.1093/bioinformatics/btaa1081
Accurate, scalable cohort variant calls using DeepVariant and GLnexus
Abstract
Motivation: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.
Results: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline.
Availability and implementation: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press.
Figures
Similar articles
-
dv-trio: a family-based variant calling pipeline using DeepVariant.Bioinformatics. 2020 Jun 1;36(11):3549-3551. doi: 10.1093/bioinformatics/btaa116. Bioinformatics. 2020. PMID: 32315409
-
ICR142 Benchmarker: evaluating, optimising and benchmarking variant calling performance using the ICR142 NGS validation series.Wellcome Open Res. 2018 Oct 31;3:108. doi: 10.12688/wellcomeopenres.14754.2. eCollection 2018. Wellcome Open Res. 2018. PMID: 30483600 Free PMC article.
-
A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing.Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i279-i287. doi: 10.1093/bioinformatics/btad268. Bioinformatics. 2023. PMID: 37387146 Free PMC article.
-
Comparison of GATK and DeepVariant by trio sequencing.Sci Rep. 2022 Feb 2;12(1):1809. doi: 10.1038/s41598-022-05833-4. Sci Rep. 2022. PMID: 35110657 Free PMC article.
-
Toward better understanding of artifacts in variant calling from high-coverage samples.Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27. Bioinformatics. 2014. PMID: 24974202 Free PMC article. Review.
Cited by
-
Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges.Brief Bioinform. 2024 Jan 22;25(2):bbad508. doi: 10.1093/bib/bbad508. Brief Bioinform. 2024. PMID: 38271481 Free PMC article. Review.
-
Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases.Genome Med. 2023 Nov 9;15(1):94. doi: 10.1186/s13073-023-01240-0. Genome Med. 2023. PMID: 37946251 Free PMC article.
-
Exploiting public databases of genomic variation to quantify evolutionary constraint on the branch point sequence in 30 plant and animal species.Nucleic Acids Res. 2023 Dec 11;51(22):12069-12075. doi: 10.1093/nar/gkad970. Nucleic Acids Res. 2023. PMID: 37953306 Free PMC article.
-
Multiancestry exome sequencing reveals INHBE mutations associated with favorable fat distribution and protection from diabetes.Nat Commun. 2022 Aug 23;13(1):4844. doi: 10.1038/s41467-022-32398-7. Nat Commun. 2022. PMID: 35999217 Free PMC article.
-
From beasts to bytes: Revolutionizing zoological research with artificial intelligence.Zool Res. 2023 Nov 18;44(6):1115-1131. doi: 10.24272/j.issn.2095-8137.2023.263. Zool Res. 2023. PMID: 37933101 Free PMC article. Review.
References
-
- Brier G.W. (1950) Verification of forecasts expressed in terms of probability. Mon. Weather Rev., 78, 1–3.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
