2016 Jul 14
1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis Thaliana
Item in Clipboard
1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis Thaliana
Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 high-quality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the fine-scale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes.
1001 Genomes; Arabidopsis thaliana; GWAS; glacial refugia; population expansion.
Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Origins of the 1001 Genomes Accessions (A) Collection locations of the 1001 Genomes accessions by diversity set (colors correspond to Venn diagram in B). (B) Relationships between 1001 Genomes accessions and other
A. thaliana diversity sets (Nordborg et al., 2005, Cao et al., 2011, Horton et al., 2012, Long et al., 2013, Schmitz et al., 2013).
Comparison of GWAS for Flowering Time Using Full Genome Variants and RegMap SNPs (A) Long day flowering time GWAS with four replicates in 1,003 (10°C) and 971 (16°C) lines. Horizontal lines represent 5% significance thresholds corrected for multiple testing using Bonferroni (dashed) and permutations (dotted). Black and gray dots are all 1001G variants, colored dots the subset also found on the RegMap 250k array. (B) Comparison of GWAS results near flowering time regulator
VIN3 (At5g57380) with the 180k biallelic SNPs (MAF > 0.03) from the 1001 Genomes full-genome set present on the RegMap 250k array. Numbers above are regional gene identifiers, e.g., “345” = “At5g57345.” Shapes denote SNP annotation: circles are non-coding; squares are synonymous; triangles are non-synonymous. Colors represent linkage disequilibrium to the top-ranked SNP in the 250k data.
Genetic and Geographic Distances between Accessions (A) The trimodal distribution of pairwise genetic distances among accessions. The mode near zero reflects very close relationships of nearly identical accessions. The mode near 0.007 includes comparisons between relicts and non-relicts. (B) Geographic locations of relicts (red) and non-relicts (blue) in Eurasia and North Africa, with pairs of nearly identical accessions at least 1 km apart connected by green lines. (C) Genetic distances of relict pairs. Pairwise distances between Iberian relicts are of similar magnitude as distances between global non-relicts (see Figure 3A), while the distances between relict groups from different geographic regions are higher. The second mode of high divergence for Iberian relicts is due to accessions admixed with non-relicts. (D) Genetic distance increases globally with geographic distance for relicts but for non-relicts only over short distances. Horizontal lines indicate median, boxes include second and third quartiles, and whiskers indicate 1.5 times interquartile range. (E) At regional scales, the rate at which genetic distance scales with geographic distance varies greatly among geographic regions for non-relicts. For each geographic region, the plot shows the genetic distance in bins of increasing geographic distance (a bin-distance of 20 km was used for S. Sweden, Iberian Peninsula, France/Germany/Benelux and 60 km bins were used for Asia, Italy/Romania/Balkans, and Britain because of uneven sampling). The shaded areas show 95% confidence intervals calculated using the ciMean function from the R package lsr. See also Figures S1, S2, and S3.
Evidence for the Importance of the Last Glacial Maximum in Structuring Historic and Modern Distribution of Relict and Non-relict Groups (A) Coalescence rates over time for pairs of individuals from different ADMIXTURE groups, inferred using MSMC. Comparisons are between non-relicts from the same group (blue), Iberian relicts (red), non-relicts from different groups (purple), and relicts and non-relicts (green). The latter also includes comparisons of relicts from different geographic regions, which look similar to relict—non-relict comparisons. Solid lines indicate means, shading standard deviations. Between 49 and 62 random pairs were used. Light blue vertical bars show the last four glacial periods. (B) Left, distributions of pairwise nucleotide diversity in 5-kb windows for four selected pairs of accessions. Colors indicate provenance of accessions, shown on right. Inset, counts in the extreme tail of the distributions. See also Table S1.
Footprints of Selection (A) Distribution of accessions containing the reference or alternate variant for a locus strongly associated with precipitation in the wettest quarter. The alternate allele is most frequent in the Asian group, but it is also present in other groups. (B) A climate associated and spatially disjunct SNP (red dashed line), located in a region densely populated with genes affecting traits such as root growth, salt tolerance, flowering, and detoxification. (C) The distribution of maximum
F ST scores in 10-kb windows along chromosome 2. The centromere is shaded, and the locations of NLR-containing disease resistance genes are in red. (D) The distribution of ω statistics in 10-kb windows along chromosome 2. Labels as in Figure 5C. See also Figures S4 and S5 and Tables S2, S3, and S4.
Local Genetic Diversity in Different Regions and Groups (A) Current land use, current and paleoclimate for relicts and non-relicts. Relicts are purple (
∗∗p < 0.01; ∗∗∗p < 0.001). Horizontal lines indicate median, boxes include second and third quartiles, and whiskers indicate 1.5 times the inter-quartile-range. (B) The geographic distribution of average pairwise distance (π) and Tajima’s D. Sizes of the green circles indicate regional π (range from 0.002 [USA] to 0.006 [Iberian Peninsula]). Dotted circles indicate the global value, 0.006. Size of purple circles represent the regional values of Tajima’s D (range from −1.01 [Northern Sweden] to −2.08 [USA], global value −2.04). Blue dots indicate sampling sites. (C) Regional diversity as a function of latitude or longitude. (D) Rank ordered distribution of non-private variants in each accession by ADMIXTURE group, offset to show density. See also Figure S6.
Overall Relationship between Total Length of Identity-by-Descent Segments and Geographical Distance in Kilometers for Each Pair in Three Groups, Related to Figure 3 In the US, many pairs share IBD segments that total over 85 Mb; in Europe, this is the case for only 0.05% of pairs, with the vast majority of pairs having IBD sharing in the range of 15-25 Mb. The minimal IBD length threshold was 10 kb. Intermediate values of intercontinental comparisons are consistent with a recent colonization of North America from European ancestors. Hexagonal bins of density, with General Additive Model predictions (k = 3, solid lines) and the 95% CI of these predictions (shaded regions).
Geographic Prediction from SPA Suggests that a Simple Isolation-by-Distance Model Does Not Hold, Related to Figure 3 (A and B) Under this model, if geographic gradients of SNPs were smooth, the collection locations of the accessions should be recovered in the SPA analysis in B. They are not. Instead, we find strong spatial gradients between, and gentle gradients within groups. (C and D) The strength of these gradients is disproportionately influenced by the relict and UK accessions. (E) Finer geographic gradients of SNP variation, especially in Southern Swedish populations, are observable when relict and UK accessions are excluded. Note that unsupervised predictions from SPA analysis are translation, scale, and rotation invariant (dimensionless).
Nucleotide Diversity (π) and Spatial Gradients of Genetic Diversity (SPA) by Group in 50 kb Windows, Related to Figure 3 While overall genetic diversity across the genome is similar across genetic groups, the geographic distribution of that variability differs across groups, with the lowest associations between the North Swedish and Asian groups. Correlations that did not reach the significance threshold of p = 0.01 are marked with an “X.”
F ST and ω for Each Chromosome, Related to Figure 5 (A) Distribution of maximum F ST scores in 10-kb windows. The centromere is shaded in each figure, and the locations of NB-LRR genes (“resistance” or R genes) are shown in red. (B) Distribution of ω-statistic in 10-kb windows. Labels as in A. On each chromosome, the lowest genome-wide F ST values, and largest estimates of the ω-statistic, are near the centromeres, which suggests that selective sweeps or background selection are common in these regions of the genome.
Variants by Type and Group, Related to Figure 5 Mean and standard deviation of the standard score (Z-score) of the number of variants of each type, by group. Relicts show the greatest normalized number of variants, especially of synonymous variants. Bars indicate means, whiskers represent one standard deviation.
Climatic and Geographic Representation of Samples within Groups, Related to Figure 6 While the distribution of climate within each group is generally in proportion to the distribution of geography (latitude, longitude, and elevation), some are not. For example, although the Asian accessions are widely distributed, the range of precipitation experienced by these accessions is surprisingly narrow. Note that a few non-Eurasian accessions were nevertheless assigned to admixture groups, and are included in these distributions. Violin plots show probability densities within admixture groups for each geoclimatic variable.
All figures (13)
Natural Genetic Variation of Arabidopsis Thaliana Is Geographically Structured in the Iberian Peninsula
FX Picó et al.
Genetics 180 (2), 1009-21.
To understand the demographic history of Arabidopsis thaliana within its native geographical range, we have studied its genetic structure in the Iberian Peninsula region. …
Epigenomic Diversity in a Global Collection of Arabidopsis Thaliana Accessions
T Kawakatsu et al.
Cell 166 (2), 492-505.
The epigenome orchestrates genome accessibility, functionality, and three-dimensional structure. Because epigenetic variation can impact transcription and thus phenotypes …
The Genetic Structure of Arabidopsis Thaliana in the South-Western Mediterranean Range Reveals a Shared History Between North Africa and Southern Europe
AC Brennan et al.
BMC Plant Biol 14, 17.
The patterns of genetic diversity and structure of A. thaliana in Morocco show that North Africa is part of the species native range and support the occurrence of a glaci …
Genomic Variation in Arabidopsis: Tools and Insights From Next-Generation Sequencing
Chromosome Res 22 (2), 103-15.
The release of a reference genome for Arabidopsis thaliana in 2000 has been an enormous boon for the study of plant genetics. Less than a decade later, however, a revolut …
Epigenetic and Epigenomic Variation in Arabidopsis Thaliana
RJ Schmitz et al.
Trends Plant Sci 17 (3), 149-54.
Arabidopsis thaliana (Arabidopsis) is ideally suited for studies of natural phenotypic variation. This species has also provided an unparalleled experimental system to ex …
PubMed Central articles
Extreme Genetic Signatures of Local Adaptation During Lotus Japonicus Colonization of Japan
N Shah et al.
Nat Commun 11 (1), 253.
Colonization of new habitats is expected to require genetic adaptations to overcome environmental challenges. Here, we use full genome re-sequencing and extensive common …
Cryptic Variation in RNA-directed DNA-methylation Controls Lateral Root Development When Auxin Signalling Is Perturbed
Z Shahzad et al.
Nat Commun 11 (1), 218.
Maintaining the right balance between plasticity and robustness in biological systems is important to allow adaptation while maintaining essential functions. Developmenta …
Common Alleles of CMT2 and NRPE1 Are Major Determinants of CHH Methylation Variation in Arabidopsis Thaliana
E Sasaki et al.
PLoS Genet 15 (12), e1008492.
DNA cytosine methylation is an epigenetic mark associated with silencing of transposable elements (TEs) and heterochromatin formation. In plants, it occurs in three seque …
Common Gardens in Teosintes Reveal the Establishment of a Syndrome of Adaptation to Altitude
MA Fustier et al.
PLoS Genet 15 (12), e1008512.
In plants, local adaptation across species range is frequent. Yet, much has to be discovered on its environmental drivers, the underlying functional traits and their mole …
Complement Genome Annotation Lift Over Using a Weighted Sequence Alignment Strategy
B Song et al.
Front Genet 10, 1046.
With the broad application of high-throughput sequencing, more whole-genome resequencing data and
de novo assemblies of natural populations are becoming available. …
Abney M. Permutation testing in the presence of polygenic variation. Genet. Epidemiol. 2015;39:249–258.
Acevedo-Garcia J., Kusch S., Panstruga R. Magical mystery tour: MLO proteins in plant immunity and beyond. New Phytol. 2014;204:273–281.
Aguadé M. Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana. Mol. Biol. Evol. 2001;18:1–9.
Alexander D.H., Novembre J., Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664.
Alonso J.M., Hirayama T., Roman G., Nourizadeh S., Ecker J.R. EIN2, a bifunctional transducer of ethylene and stress responses in Arabidopsis. Science. 1999;284:2148–2152.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Genome-Wide Association Study
LinkOut - more resources
Full Text Sources Other Literature Sources