Haploid, diploid, and pooled exome capture recapitulate features of biology and paralogy in two non-model tree species

Mol Ecol Resour. 2022 Jan;22(1):225-238. doi: 10.1111/1755-0998.13474. Epub 2021 Aug 14.

Abstract

Despite their suitability for studying evolution, many conifer species have large and repetitive giga-genomes (16-31 Gbp) that create hurdles to producing high coverage SNP data sets that capture diversity from across the entirety of the genome. Due in part to multiple ancient whole genome duplication events, gene family expansion and subsequent evolution within Pinaceae, false diversity from the misalignment of paralog copies creates further challenges in accurately and reproducibly inferring evolutionary history from sequence data. Here, we leverage the cost-saving benefits of pool-seq and exome-capture to discover SNPs in two conifer species, Douglas-fir (Pseudotsuga menziesii var. menziesii (Mirb.) Franco, Pinaceae) and jack pine (Pinus banksiana Lamb., Pinaceae). We show, using minimal baseline filtering, that allele frequencies estimated from pooled individuals show a strong, positive correlation with those estimated by sequencing the same population as individuals (r > .948), on par with such comparisons made in model organisms. Further, we highlight the utility of haploid megagametophyte tissue for identifying sites that are probably due to misaligned paralogs. Together with additional minor filtering, we show that it is possible to remove many of the loci with large frequency estimate discrepancies between individual and pooled sequencing approaches, improving the correlation further (r > .973). Our work addresses bioinformatic challenges in non-model organisms with large and complex genomes, highlights the use of megagametophyte tissue for the identification of paralogous artefacts, and suggests the combination of pool-seq and exome capture to be robust for further evolutionary hypothesis testing in these systems.

Keywords: Pinaceae; exome-capture; non-model; paralogy; pool-seq.

MeSH terms

  • Animals
  • Biology
  • Diploidy*
  • Exome
  • Haploidy
  • Humans
  • Sheep
  • Trees*