Here, we describe a worldwide haplotype map for soybean (GmHapMap) constructed using whole-genome sequence data for 1007 Glycine max accessions and yielding 14.9 million variants as well as 4.3 M tag single-nucleotide polymorphisms (SNPs). When sampling random subsets of these accessions, the number of variants and tag SNPs plateaued beyond approximately 800 and 600 accessions, respectively. This suggests extensive coverage of diversity within the cultivated soybean. GmHapMap variants were imputed onto 21 618 previously genotyped accessions with up to 96% success for common alleles. A local association analysis was performed with the imputed data using markers located in a 1-Mb region known to contribute to seed oil content and enabled us to identify a candidate causal SNP residing in the NPC1 gene. We determined gene-centric haplotypes (407 867 GCHs) for the 55 589 genes and showed that such haplotypes can help to identify alleles that differ in the resulting phenotype. Finally, we predicted 18 031 putative loss-of-function (LOF) mutations in 10 662 genes and illustrated how such a resource can be used to explore gene function. The GmHapMap provides a unique worldwide resource for applied soybean genomics and breeding.
Keywords: genetic variants; haplotype; haplotype map; imputation; loss-of-function mutation; soybean; whole-genome sequencing.
© 2020 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.