The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content

Plant Biotechnol J. 2021 Sep;19(9):1852-1862. doi: 10.1111/pbi.13600. Epub 2021 Jun 15.

Abstract

Studies on structural variation in plants have revealed the inadequacy of a single reference genome for an entire species and suggest that it is necessary to build a species-representative genome called a pan-genome to better capture the extent of both structural and nucleotide variation. Here, we present a pan-genome of cultivated soybean (Glycine max), termed PanSoy, constructed using the de novo genome assembly of 204 phylogenetically and geographically representative improved accessions selected from the larger GmHapMap collection. PanSoy uncovers 108 Mb (˜11%) of novel nonreference sequences encompassing 3621 protein-coding genes (including 1659 novel genes) absent from the soybean 'Williams 82' reference genome. Nonetheless, the core genome represents an exceptionally large proportion of the genome, with >90.6% of genes being shared by >99% of the accessions. A majority of PAVs encompassing genes could be confirmed with long-read sequencing on a subset of accessions. The PanSoy is a major step towards capturing the extent of genetic variation in cultivated soybean and provides a resource for soybean genomics research and breeding.

Keywords: GmHapMap; de novo assembly; genic PAV; long-read sequencing; pan-genome; soybean.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Fabaceae*
  • Genome, Plant / genetics
  • Genomics
  • Glycine max* / genetics
  • Plant Breeding