PolyCRACKER, a robust method for the unsupervised partitioning of polyploid subgenomes by signatures of repetitive DNA evolution

BMC Genomics. 2019 Jul 12;20(1):580. doi: 10.1186/s12864-019-5828-5.

Abstract

Background: Our understanding of polyploid genomes is limited by our inability to definitively assign sequences to a specific subgenome without extensive prior knowledge like high resolution genetic maps or genome sequences of diploid progenitors. In theory, existing methods for assigning sequences to individual species from metagenome samples could be used to separate subgenomes in polyploid organisms, however, these methods rely on differences in coarse genome properties like GC content or sequences from related species. Thus, these approaches do not work for subgenomes where gross features are indistinguishable and related genomes are lacking. Here we describe a method that uses rapidly evolving repetitive DNA to circumvent these limitations.

Results: By using short, repetitive, DNA sequences as species-specific signals we separated closely related genomes from test datasets and subgenomes from two polyploid plants, tobacco and wheat, without any prior knowledge.

Conclusion: This approach is ideal for separating the subgenomes of polyploid species with unsequenced or unknown progenitor genomes.

Keywords: Allopolyploid; Binning; Evolution; K-mer; Repetitive DNA; Subgenome; Tobacco; Transposon; Wheat.

MeSH terms

  • DNA, Plant / genetics*
  • Evolution, Molecular*
  • Genome, Plant / genetics
  • Genomics / methods*
  • Phylogeny
  • Polyploidy*
  • Repetitive Sequences, Nucleic Acid / genetics*
  • Tobacco / genetics
  • Triticum / genetics
  • Unsupervised Machine Learning*

Substances

  • DNA, Plant