Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 20;11(1):989.
doi: 10.1038/s41467-020-14779-y.

Chromosome-level Assemblies of Multiple Arabidopsis Genomes Reveal Hotspots of Rearrangements With Altered Evolutionary Dynamics

Affiliations
Free PMC article

Chromosome-level Assemblies of Multiple Arabidopsis Genomes Reveal Hotspots of Rearrangements With Altered Evolutionary Dynamics

Wen-Biao Jiao et al. Nat Commun. .
Free PMC article

Abstract

Despite hundreds of sequenced Arabidopsis genomes, very little is known about the degree of genomic collinearity within single species, due to the low number of chromosome-level assemblies. Here, we report chromosome-level reference-quality assemblies of seven Arabidopsis thaliana accessions selected across its global range. Each genome reveals between 13-17 Mb rearranged, and 5-6 Mb non-reference sequences introducing copy-number changes in ~5000 genes, including ~1900 non-reference genes. Quantifying the collinearity between the genomes reveals ~350 euchromatic regions, where accession-specific tandem duplications destroy the collinearity between the genomes. These hotspots of rearrangements are characterized by reduced meiotic recombination in hybrids and genes implicated in biotic stress response. This suggests that hotspots of rearrangements undergo altered evolutionary dynamics, as compared to the rest of the genome, which are mostly based on the accumulation of new mutations and not on the recombination of existing variation, and thereby enable a quick response to the biotic stress.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Chromosome-level genome assemblies of seven A. thaliana accessions.
The light gray bars outline each of the chromosomes, whereas the dark gray inlays show the extent of each of the pericentromeric regions. The contig arrangements of the chromosome assemblies is shown in green for contigs > 1 Mb and dark grey for contigs < 1 Mb. The location of centromeric tandem repeat arrays and rDNA clusters within the assemblies are marked by yellow and blue boxes above each of the chromosomes. Source Data are provided as a Source Data file.
Fig. 2
Fig. 2. Structural and sequence differences between the genomes.
a Schematic of the structural differences (upper panel) and sequence variation (lower panel) that can be identified between chromosome-level assemblies. Note, local sequence variation can reside in syntenic as well as in rearranged regions. The barplots on the right upper side show the total span of syntenic and rearranged regions between the reference and each of other accessions (colors match the schematic on the left): The left barplots shows the sequence span in respect to the reference sequence, while the right plot shows the sequence space, which is specific to each of the accessions. The barplots on the right lower side show local sequence variation (per kb) in syntenic (left) and rearranged (right) regions between the reference and each of other accessions (again colors match the schematic on the left). b Size distributions of different types of structural and sequence variation. c Gene copy-number variations between the reference and each of the accessions. The left barplots shows the fraction of reference genes which are in gene families with conserved or variable copy numbers. The right barplots shows the number of non-reference genes found in at least two accessions, or found to be specific to an accession genome. d Pan-genome and core-genome estimations for sequence (upper plot) and gene space (lower plot) were based on all pairwise whole-genome and gene set comparisons across all eight accessions. Each black point corresponds to a pan- or core-genome size estimated with a particular combination of genomes. Pan-genome (blue) and core-genome (red) estimations were fitted using an exponential model. Source Data are provided as a Source Data file.
Fig. 3
Fig. 3. Quantitative analysis of synteny reveals hotspots of rearrangements.
a Synteny diversity along each chromosome: (100 kb sliding windows with a step-size of 50 kb in blue; 5 kb sliding windows with a step-size of 1 kb in grey). Red bars: R gene clusters. Gray rectangles: centromeres. The dashed green and red lines indicate thresholds for synteny diversity values of 0.25 and 0.50. The labelled arrow (A) indicates a 2.48 Mb inversion in the Sha genome. The labelled arrow (B) indicates the location of the example shown in d. b Gene and TE densities in 10,331 syntenic (SYN) and 576 hotspots of rearrangements (HOT) regions. c The number of variable copy-number alleles in 10,331 syntenic (SYN) and 576 rearrangements (HOT) regions. d An example of a HOT region including the RPP4/RPP5 R gene cluster. The upper panel shows the distribution of synteny diversity (blue curve), nucleotide diversity (gray background) and haplotype diversity (pink background) in a 5 kb sliding window with a step-size of 1 kb. Both the nucleotide diversity and the haplotype diversity were calculated based on informative markers (MAF ≥ 0.05, missing rate < 0.2) from the 1001 Genomes Project. The marker density is shown as the heatmap on top. The green and red dashed lines indicate the value 0.25 and 0.50 of synteny diversity, respectively. The schematic in the lower part shows the annotated protein-coding genes (colored rectangles). Blue rectangles: non-resistance genes. Other colored rectangles: resistance genes where genes with the same color belong to the same gene family. The gray links between the rectangles indicate the homologous relationships between non-resistance genes. e A dot plot of Col-0 and C24 sequence from the HOT region shown in d. Red lines: homologous regions between the two genomes. f The distribution of synteny diversity values in 1 kb sliding windows around and in 576 HOT regions. In box plots b, c and f, centre line: median, bounds of box: 25th and 75th percentiles, whiskers: 1.5 * IQR (IQR: the interquartile range between the 25th and the 75th percentile). Source Data are provided as a Source Data file.
Fig. 4
Fig. 4. Two examples for hotspots of rearrangements.
Visualization of a the DM6 locus (RPP7) and b an unnamed R gene cluster on chromosome 5. Descriptions for the plots can be found in the legend of Fig. 3d. Source Data are provided as a Source Data file.
Fig. 5
Fig. 5. The causes and consequences of hotspots of rearrangements.
a Crossover (CO) breakpoints, identified in Col-0 x Ler hybrids were checked for their overlaps in syntenic or rearranged regions. Only unique CO intervals smaller than 5 kb were used. Obs.: observed. Exp.: expected. One-sided χ2 test was used. b Linkage disequilibrium (LD) calculated in 4 kb windows in and around each of the 576 HOT regions as shown in the lower part. SYN, syntenic region. LD was calculated as the correlation coefficient (r2) based on informative SNP markers (MAF > 0.05, missing rate < 0.2) selected from the 1001 Genomes Project data. One-sided U test was used. c Minor allele frequency of SNP markers in 10,331 syntenic, 10,501 partially syntenic, and 576 HOT regions. The SNP markers (MAF > 0.005, missing rate < 0.2) from 1001 Genomes Project were used. d Frequency of deleterious mutations in 10,331 syntenic (SYN) regions and 576 HOT regions. Deleterious mutations include SNPs and small indels that introduce premature stop codons, loss of start or stop codons, frameshifts, splicing sites mutations or deletions of exons. One-sided U test was used. e GO term enrichment analysis of protein-coding genes in 576 HOT regions. Fisher exact test used, p < 0.05. In box plots b and d, centre line: median, bounds of box: 25th and 75th percentiles, whiskers: 1.5 * IQR (IQR: the interquartile range between the 25th and the 75th percentile). p< 0.001: ***. Source Data are provided as a Source Data file.

Similar articles

See all similar articles

References

    1. McDonald MJ, Rice DP, Desai MM. Sex speeds adaptation by altering the dynamics of molecular evolution. Nature. 2016;531:233–236. doi: 10.1038/nature17143. - DOI - PMC - PubMed
    1. Heng HHQ. Elimination of altered karyotypes by sexual reproduction preserves species identity. Genome. 2007;50:517–524. doi: 10.1139/G07-039. - DOI - PubMed
    1. Lamichhaney S, et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax) Nat. Genet. 2015;48:84–88. doi: 10.1038/ng.3430. - DOI - PubMed
    1. Lowry DB, Willis JH. A widespread chromosomal inversion polymorphism contributes to a major life-history transition, local adaptation, and reproductive isolation. PLoS Biol. 2010;8:e1000500. doi: 10.1371/journal.pbio.1000500. - DOI - PMC - PubMed
    1. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature408, 796–815 (2000). - PubMed

Publication types

Feedback