Identifying genetic variants underlying phenotypic variation in plants without complete genomes

Nat Genet. 2020 May;52(5):534-540. doi: 10.1038/s41588-020-0612-7. Epub 2020 Apr 13.


Structural variants and presence/absence polymorphisms are common in plant genomes, yet they are routinely overlooked in genome-wide association studies (GWAS). Here, we expand the type of genetic variants detected in GWAS to include major deletions, insertions and rearrangements. We first use raw sequencing data directly to derive short sequences, k-mers, that mark a broad range of polymorphisms independently of a reference genome. We then link k-mers associated with phenotypes to specific genomic regions. Using this approach, we reanalyzed 2,000 traits in Arabidopsis thaliana, tomato and maize populations. Associations identified with k-mers recapitulate those found with SNPs, but with stronger statistical support. Importantly, we discovered new associations with structural variants and with regions missing from reference genomes. Our results demonstrate the power of performing GWAS before linking sequence reads to specific genomic regions, which allows the detection of a wider range of genetic variants responsible for phenotypic variation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Variation, Population
  • Genome, Plant / genetics*
  • Genome-Wide Association Study / methods
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Analysis, DNA / methods
  • Solanum lycopersicum / genetics
  • Zea mays / genetics