Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3 (7), e196

The Pattern of Polymorphism in Arabidopsis Thaliana


The Pattern of Polymorphism in Arabidopsis Thaliana

Magnus Nordborg et al. PLoS Biol.


We resequenced 876 short fragments in a sample of 96 individuals of Arabidopsis thaliana that included stock center accessions as well as a hierarchical sample from natural populations. Although A. thaliana is a selfing weed, the pattern of polymorphism in general agrees with what is expected for a widely distributed, sexually reproducing species. Linkage disequilibrium decays rapidly, within 50 kb. Variation is shared worldwide, although population structure and isolation by distance are evident. The data fail to fit standard neutral models in several ways. There is a genome-wide excess of rare alleles, at least partially due to selection. There is too much variation between genomic regions in the level of polymorphism. The local level of polymorphism is negatively correlated with gene density and positively correlated with segmental duplications. Because the data do not fit theoretical null distributions, attempts to infer natural selection from polymorphism data will require genome-wide surveys of polymorphism in order to identify anomalous regions. Despite this, our data support the utility of A. thaliana as a model for evolutionary functional genomics.


Figure 1
Figure 1. Levels of Polymorphism for Different Classes of Sites
Levels of polymorphism were quantified using two different estimators of the neutral mutation rate θ: θ^S , which uses the number of polymorphic sites, and θ^P , which uses the average number of pairwise differences [3].
Figure 2
Figure 2. Population Structure and Genomic Distributions of Various Statistics
(A) Results from Structure under different assumptions about the number of clusters (K = 2,…, 8). Each individual is represented by a line, which is partitioned into K colored segments according to the individual's estimated membership fractions in each of the K clusters. The assignment of each individual is the average across the genome. (B) Results from Structure across Chromosome 1 for K = 3. Each chromosomal segment is colored according to the cluster in which it had the highest probability of membership. (C) A plot showing those fragments that appear to be monophyletic with respect to each of the three clusters identified by Structure. (D) FST with respect to the same three clusters (blue solid line) and the lower 95th percentile of FST obtained through 1,000 random permutations of the accessions (red dotted line). (E) θ^P within each of the three clusters. (F) Tajima's D statistic within each of the three clusters. (G) Results from Structure across Chromosome 1 for K = 8. (H) A plot showing those fragments that appear to be monophyletic with respect to each of these eight clusters. (I) FST with respect to these eight clusters.
Figure 3
Figure 3. Population Structure in A. thaliana
Each pie chart represents an accession, and is placed on the map according to origin (some of the population samples were too densely sampled and have been shifted for clarity). Accessions sampled outside Europe have been placed at the correct latitude. The exact origin of the standard lab accession Col-0 is not known. The colors and proportions within each pie chart correspond to the output of Structure in Figure 2. (A) K = 3; (B) K = 8.
Figure 4
Figure 4. The Distribution of Pairwise Differences (SNPs Only) between All Pairs of Accessions
(A) An example of the distribution we would expect to see in the absence of population structure, obtained by randomizing genotypes with respect to individuals for each sequenced fragment. (B) The observed distribution. (C) The observed distribution with accessions Cvi-0 and Mr-0 removed.
Figure 5
Figure 5. Haplotype Sharing on Chromosome 4 among Pairs of Individuals in the Population Samples from Northern Sweden, Central Europe, and the US
The lines indicate regions where the particular pair of accessions share at least five identical adjacent fragments. Within-population comparisons are highlighted in red. The patterns in southern Sweden and in the UK are similar to that in central Europe.
Figure 6
Figure 6. The Decay of LD as a Function of Distance between the Polymorphisms
Figure 7
Figure 7. Characteristics of the Pattern of Polymorphism
(A) The allele frequency distribution for synonymous and nonsynonymous SNPs using a sample size of 90 individuals (loci with less than 90 individuals were not used; loci with greater than 90 individuals were randomly culled). For a sample of size n, the expected frequency of SNP loci with a minor allele frequency of i under a standard constant-size population genetics model is formula image . The excess of rare alleles is largely limited to frequencies one and two. (B) The distribution of Tajima's D statistic [27] across the sequenced fragments, along with its expected distribution in a constant population (estimated by simulating 1,000 datasets matching the real one in terms of exon/nonexon composition and sample size). (C) The distribution of the level of polymorphism (θ^S ) across the sequenced fragments along with its expected distribution (estimated the same way). (D) The level of polymorphism in nonexon sequences as a function of the local gene density (measured in open reading frames per centimorgan). (E) The level of polymorphism in nonexon sequences as a function of the degree of duplication in each fragment (measured as the negative log10 of the BLAST significance for the second-best hit in the genome). The patterns in (D) and (E) are also seen in exons.

Similar articles

See all similar articles

Cited by 400 PubMed Central articles

See all "Cited by" articles


    1. Lewontin RC, Hubby JL. A molecular approach to the study of genetic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura . Genetics. 1966;54:595–609. - PMC - PubMed
    1. Kreitman M. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster . Nature. 1983;304:412–417. - PubMed
    1. Li WH. Molecular evolution. Sunderland (Massachusetts): Sinauer Associates; 1997. 487 pp.
    1. Kreitman M. Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000;1:539–559. - PubMed
    1. Stephens M. Inference under the coalescent. In: Balding DJ, Bishop MJ, Cannings C, editors. Handbook of statistical genetics. Chichester (United Kingdom): John Wiley and Sons; 2001. pp. 213–238.

Publication types

LinkOut - more resources