Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Abstract

Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 high-quality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the fine-scale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes.

Keywords: 1001 Genomes; Arabidopsis thaliana; GWAS; glacial refugia; population expansion.

Figures

None
Figure 1
Figure 1
Origins of the 1001 Genomes Accessions (A) Collection locations of the 1001 Genomes accessions by diversity set (colors correspond to Venn diagram in B). (B) Relationships between 1001 Genomes accessions and other A. thaliana diversity sets (Nordborg et al., 2005, Cao et al., 2011, Horton et al., 2012, Long et al., 2013, Schmitz et al., 2013).
Figure 2
Figure 2
Comparison of GWAS for Flowering Time Using Full Genome Variants and RegMap SNPs (A) Long day flowering time GWAS with four replicates in 1,003 (10°C) and 971 (16°C) lines. Horizontal lines represent 5% significance thresholds corrected for multiple testing using Bonferroni (dashed) and permutations (dotted). Black and gray dots are all 1001G variants, colored dots the subset also found on the RegMap 250k array. (B) Comparison of GWAS results near flowering time regulator VIN3 (At5g57380) with the 180k biallelic SNPs (MAF > 0.03) from the 1001 Genomes full-genome set present on the RegMap 250k array. Numbers above are regional gene identifiers, e.g., “345” = “At5g57345.” Shapes denote SNP annotation: circles are non-coding; squares are synonymous; triangles are non-synonymous. Colors represent linkage disequilibrium to the top-ranked SNP in the 250k data.
Figure 3
Figure 3
Genetic and Geographic Distances between Accessions (A) The trimodal distribution of pairwise genetic distances among accessions. The mode near zero reflects very close relationships of nearly identical accessions. The mode near 0.007 includes comparisons between relicts and non-relicts. (B) Geographic locations of relicts (red) and non-relicts (blue) in Eurasia and North Africa, with pairs of nearly identical accessions at least 1 km apart connected by green lines. (C) Genetic distances of relict pairs. Pairwise distances between Iberian relicts are of similar magnitude as distances between global non-relicts (see Figure 3A), while the distances between relict groups from different geographic regions are higher. The second mode of high divergence for Iberian relicts is due to accessions admixed with non-relicts. (D) Genetic distance increases globally with geographic distance for relicts but for non-relicts only over short distances. Horizontal lines indicate median, boxes include second and third quartiles, and whiskers indicate 1.5 times interquartile range. (E) At regional scales, the rate at which genetic distance scales with geographic distance varies greatly among geographic regions for non-relicts. For each geographic region, the plot shows the genetic distance in bins of increasing geographic distance (a bin-distance of 20 km was used for S. Sweden, Iberian Peninsula, France/Germany/Benelux and 60 km bins were used for Asia, Italy/Romania/Balkans, and Britain because of uneven sampling). The shaded areas show 95% confidence intervals calculated using the ciMean function from the R package lsr. See also Figures S1, S2, and S3.
Figure 4
Figure 4
Evidence for the Importance of the Last Glacial Maximum in Structuring Historic and Modern Distribution of Relict and Non-relict Groups (A) Coalescence rates over time for pairs of individuals from different ADMIXTURE groups, inferred using MSMC. Comparisons are between non-relicts from the same group (blue), Iberian relicts (red), non-relicts from different groups (purple), and relicts and non-relicts (green). The latter also includes comparisons of relicts from different geographic regions, which look similar to relict—non-relict comparisons. Solid lines indicate means, shading standard deviations. Between 49 and 62 random pairs were used. Light blue vertical bars show the last four glacial periods. (B) Left, distributions of pairwise nucleotide diversity in 5-kb windows for four selected pairs of accessions. Colors indicate provenance of accessions, shown on right. Inset, counts in the extreme tail of the distributions. See also Table S1.
Figure 5
Figure 5
Footprints of Selection (A) Distribution of accessions containing the reference or alternate variant for a locus strongly associated with precipitation in the wettest quarter. The alternate allele is most frequent in the Asian group, but it is also present in other groups. (B) A climate associated and spatially disjunct SNP (red dashed line), located in a region densely populated with genes affecting traits such as root growth, salt tolerance, flowering, and detoxification. (C) The distribution of maximum FST scores in 10-kb windows along chromosome 2. The centromere is shaded, and the locations of NLR-containing disease resistance genes are in red. (D) The distribution of ω statistics in 10-kb windows along chromosome 2. Labels as in Figure 5C. See also Figures S4 and S5 and Tables S2, S3, and S4.
Figure 6
Figure 6
Local Genetic Diversity in Different Regions and Groups (A) Current land use, current and paleoclimate for relicts and non-relicts. Relicts are purple (∗∗p < 0.01; ∗∗∗p < 0.001). Horizontal lines indicate median, boxes include second and third quartiles, and whiskers indicate 1.5 times the inter-quartile-range. (B) The geographic distribution of average pairwise distance (π) and Tajima’s D. Sizes of the green circles indicate regional π (range from 0.002 [USA] to 0.006 [Iberian Peninsula]). Dotted circles indicate the global value, 0.006. Size of purple circles represent the regional values of Tajima’s D (range from −1.01 [Northern Sweden] to −2.08 [USA], global value −2.04). Blue dots indicate sampling sites. (C) Regional diversity as a function of latitude or longitude. (D) Rank ordered distribution of non-private variants in each accession by ADMIXTURE group, offset to show density. See also Figure S6.
Figure S1
Figure S1
Overall Relationship between Total Length of Identity-by-Descent Segments and Geographical Distance in Kilometers for Each Pair in Three Groups, Related to Figure 3 In the US, many pairs share IBD segments that total over 85 Mb; in Europe, this is the case for only 0.05% of pairs, with the vast majority of pairs having IBD sharing in the range of 15-25 Mb. The minimal IBD length threshold was 10 kb. Intermediate values of intercontinental comparisons are consistent with a recent colonization of North America from European ancestors. Hexagonal bins of density, with General Additive Model predictions (k = 3, solid lines) and the 95% CI of these predictions (shaded regions).
Figure S2
Figure S2
Geographic Prediction from SPA Suggests that a Simple Isolation-by-Distance Model Does Not Hold, Related to Figure 3 (A and B) Under this model, if geographic gradients of SNPs were smooth, the collection locations of the accessions should be recovered in the SPA analysis in B. They are not. Instead, we find strong spatial gradients between, and gentle gradients within groups. (C and D) The strength of these gradients is disproportionately influenced by the relict and UK accessions. (E) Finer geographic gradients of SNP variation, especially in Southern Swedish populations, are observable when relict and UK accessions are excluded. Note that unsupervised predictions from SPA analysis are translation, scale, and rotation invariant (dimensionless).
Figure S3
Figure S3
Nucleotide Diversity (π) and Spatial Gradients of Genetic Diversity (SPA) by Group in 50 kb Windows, Related to Figure 3 While overall genetic diversity across the genome is similar across genetic groups, the geographic distribution of that variability differs across groups, with the lowest associations between the North Swedish and Asian groups. Correlations that did not reach the significance threshold of p = 0.01 are marked with an “X.”
Figure S4
Figure S4
Genome-wide FST and ω for Each Chromosome, Related to Figure 5 (A) Distribution of maximum FST scores in 10-kb windows. The centromere is shaded in each figure, and the locations of NB-LRR genes (“resistance” or R genes) are shown in red. (B) Distribution of ω-statistic in 10-kb windows. Labels as in A. On each chromosome, the lowest genome-wide FST values, and largest estimates of the ω-statistic, are near the centromeres, which suggests that selective sweeps or background selection are common in these regions of the genome.
Figure S5
Figure S5
Variants by Type and Group, Related to Figure 5 Mean and standard deviation of the standard score (Z-score) of the number of variants of each type, by group. Relicts show the greatest normalized number of variants, especially of synonymous variants. Bars indicate means, whiskers represent one standard deviation.
Figure S6
Figure S6
Climatic and Geographic Representation of Samples within Groups, Related to Figure 6 While the distribution of climate within each group is generally in proportion to the distribution of geography (latitude, longitude, and elevation), some are not. For example, although the Asian accessions are widely distributed, the range of precipitation experienced by these accessions is surprisingly narrow. Note that a few non-Eurasian accessions were nevertheless assigned to admixture groups, and are included in these distributions. Violin plots show probability densities within admixture groups for each geoclimatic variable.

Comment in

Similar articles

See all similar articles

Cited by 193 PubMed Central articles

See all "Cited by" articles

References

    1. Abney M. Permutation testing in the presence of polygenic variation. Genet. Epidemiol. 2015;39:249–258. - PMC - PubMed
    1. Acevedo-Garcia J., Kusch S., Panstruga R. Magical mystery tour: MLO proteins in plant immunity and beyond. New Phytol. 2014;204:273–281. - PubMed
    1. Aguadé M. Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana. Mol. Biol. Evol. 2001;18:1–9. - PubMed
    1. Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. - PMC - PubMed
    1. Alonso J.M., Hirayama T., Roman G., Nourizadeh S., Ecker J.R. EIN2, a bifunctional transducer of ethylene and stress responses in Arabidopsis. Science. 1999;284:2148–2152. - PubMed

Publication types

LinkOut - more resources

Feedback