Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 31;118(35):e2102914118.
doi: 10.1073/pnas.2102914118.

Genomic structural variants constrain and facilitate adaptation in natural populations of Theobroma cacao, the chocolate tree

Affiliations

Genomic structural variants constrain and facilitate adaptation in natural populations of Theobroma cacao, the chocolate tree

Tuomas Hämälä et al. Proc Natl Acad Sci U S A. .

Abstract

Genomic structural variants (SVs) can play important roles in adaptation and speciation. Yet the overall fitness effects of SVs are poorly understood, partly because accurate population-level identification of SVs requires multiple high-quality genome assemblies. Here, we use 31 chromosome-scale, haplotype-resolved genome assemblies of Theobroma cacao-an outcrossing, long-lived tree species that is the source of chocolate-to investigate the fitness consequences of SVs in natural populations. Among the 31 accessions, we find over 160,000 SVs, which together cover eight times more of the genome than single-nucleotide polymorphisms and short indels (125 versus 15 Mb). Our results indicate that a vast majority of these SVs are deleterious: they segregate at low frequencies and are depleted from functional regions of the genome. We show that SVs influence gene expression, which likely impairs gene function and contributes to the detrimental effects of SVs. We also provide empirical support for a theoretical prediction that SVs, particularly inversions, increase genetic load through the accumulation of deleterious nucleotide variants as a result of suppressed recombination. Despite the overall detrimental effects, we identify individual SVs bearing signatures of local adaptation, several of which are associated with genes differentially expressed between populations. Genes involved in pathogen resistance are strongly enriched among these candidates, highlighting the contribution of SVs to this important local adaptation trait. Beyond revealing empirical evidence for the evolutionary importance of SVs, these 31 de novo assemblies provide a valuable resource for genetic and breeding studies in Tcacao.

Keywords: cacao; de novo assembly; genetic load; local adaptation; structural variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
SVs identified from 62 high-quality genome assemblies. (A) Approximate locations of the study populations. (B) Sequence contiguity (based on the 5,000 longest contigs) between the two published reference assemblies (black lines) and our 31 assemblies (one haplotype per accession; colored lines). For each of our assemblies, the cumulative sequence length increases faster than in the reference genomes, indicating higher contiguity. (C) The number of SVs identified for each accession. (D) The number of SVs overlapping different genomic features, counted individually for each accession. (E) The number of uniquely located SVs at different regions of the genome, counted in windows of 2 Mb.
Fig. 2.
Fig. 2.
Fitness effects of SVs. (A) AFS for each SV type, compared to synonymous (sSNP) and nonsynonymous (nSNP) nucleotide variants. (B) AFS for SVs overlapping different genomic features. (C) Measures of selective constraint at genes affected by SVs. Shown are the ratio of nonsynonymous to synonymous nucleotide diversity (πNS) and the ratio of nonsynonymous to synonymous nucleotide divergence (dN/dS) for genes within the SVs and for genes overlapping the SV breakpoints. Gray horizontal lines show medians of control genes. (D) The probability that a variant within 5 Kb of a gene is associated with its expression. The SV results are compared to randomized data (RAND) and SNPs. Common variant: MAF > 0.05, rare variant: MAF ≤ 0.05. Error bars show 95% CIs.
Fig. 3.
Fig. 3.
Genetic load at inversions. (A) Absolute nucleotide differentiation (dXY) between the major and minor homozygotes (INV), compared to random collinear regions of equal size (CTRL). (B) Average decay of linkage disequilibrium as a function of physical distance in collinear regions and in the major and minor INV homozygotes (estimated using the same sample size). (C) Percentage of derived nucleotide alleles in collinear regions and in the major and minor INV homozygotes. Results are divided into intergenic regions (>5 Kb from each gene), synonymous sites (4-fold), and nonsynonymous sites (0-fold). Error bars show 95% CIs. (D) The percentage of nonsynonymous SNPs predicted to be deleterious. Error bars show 95% CIs.
Fig. 4.
Fig. 4.
SVs and local adaptation. (A) Variation along the first two eigenvectors of a principal components analysis conducted with SVs. (B) The distribution of FST estimates for the three SV types compared to simulated neutral samples (SIM). (C) Example of a nonrecombining haplotype block, caused by a 330-Kb INV (shaded area). Shown are SNP-based FST estimates between the Iquitos and Nanay populations. (D) Expression level at genes affected by selection outlier SVs. Shown are five genes with the largest proportion of expression variance explained by the SVs (R2). (E) Relationship between 126 resequenced cacao accessions (30) used in genotyping our assembly-based SVs. Accessions are labeled according to their largest contributing ancestral population. Shown are t-distributed stochastic neighbor embedding (t-SNE) projections performed on genome-wide SNP data (SI Appendix, Fig. S19 shows t-SNE on SV data.) (F) P values from an ancestry-based genome scan, conducted using 3,011 SVs genotyped for the 126 accessions. Red dashed line: Q < 0.1.

Similar articles

Cited by

References

    1. Sturtevant A. H., The linear arrangement of six sex‐linked factors in Drosophila, as shown by their mode of association. J. Exp. Zool. 14, 43–59 (1913).
    1. McClintock B., Cytological Observations of Deficiencies Involving Known Genes, Translocations and an Inversion in Zea mays (University of Missouri, College of Agriculture, Agricultural Experiment Station, 1931).
    1. Bridges C. B., The Bar “gene” a duplication. Science 83, 210–211 (1936). - PubMed
    1. Chakraborty M., Emerson J. J., Macdonald S. J., Long A. D., Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019). - PMC - PubMed
    1. Jiao W. B., Schneeberger K., Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2020). - PMC - PubMed

Publication types

LinkOut - more resources