Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 5 (6), e1000500

The Role of Geography in Human Adaptation


The Role of Geography in Human Adaptation

Graham Coop et al. PLoS Genet.


Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme F(ST) values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes -- especially population history, migration, and drift -- exert powerful influences over the fate and geographic distribution of selected alleles.

Conflict of interest statement

The authors have declared that no competing interests exist.


Figure 1
Figure 1. Genic SNPs are more likely than nongenic SNPs to have extreme allele frequency differences between populations.
For each plot the x-axis shows the signed difference formula image in derived allele frequency between two HapMap populations. The y-axis plots the fold enrichment of genic and nongenic SNPs as a function of formula image: i.e., for each bin we plot the fraction of SNPs in that bin that are genic (respectively, nongenic), divided by the fraction of all SNPs that are genic (respectively, nongenic). The peach-colored region gives the central 90% confidence interval (estimated by bootstrap resampling of 200 kb regions from the genome); when the lower edge of the peach region is >1 this indicates significant enrichment of genic SNPs, assuming a one-tailed test at p = 0.05. Genotype frequencies were estimated from Phase II HapMap data using only SNPs that were identified by Perlegen in a uniform multiethnic panel (“Type A” SNPs) . The numbers of SNPs in the tails are given in Supplementary Table 1 in Text S1.
Figure 2
Figure 2. The relationship between mean FST and the most extreme allele frequency differences genome-wide between pairs of HGDP populations.
The x-axis of each plot shows the autosomal mean FST for pairs of HGDP populations, considering all possible pairs from among the 26 HGDP populations with samples of ≥15 individuals. The y-axes show the value of (A) the maximum autosomal allele frequency difference (formula image) for each population pair, and (B) the value of the 65th most extreme formula image for each population pair (i.e., the 99.99th percentile of the allele frequency distribution). To provide a sense of scale on the figure, red arrows are used to indicate the mean autosomal pairwise FST between some arbitrary pairs of populations (key: French (Fra), Palestinian (Pal), Han-Chinese (Han) and Yoruba (Yor)). The red lines plot lowess fits to the data. Plots of the extremes of pairwise FST and with different sample size cutoffs are similar (Supplementary Figures 5 and 6 in Text S1).
Figure 3
Figure 3. Global allele frequency distributions for SNPs with extreme FST between certain population pairs.
Each row plots frequency distributions for 50 of the most extreme SNPs genome-wide in the following pairs of comparisons: (A, B): SNPs for which Yoruba are highly differentiated from both Han and French; (C, D): French are differentiated from Yoruba and Han; (E, F): Han are differentiated from Yoruba and French. Left column: pie charts of the mean allele frequencies of the 50 highly differentiated SNPs across the HGDP populations; blue and red denote the major and minor alleles in Yoruba, respectively. Right column: The same data are plotted in an expanded format: populations with ≥10 sampled individuals are listed along the x-axis, roughly ordered by geography ; vertical grey lines divide the populations based on broad geographic region and dashed grey lines identify populations known to be admixed between broad geographic regions. The y-axis plots allele frequencies in each population; alleles are polarized according to the minor allele in Yoruba. Individual SNP frequencies in each population are shown as grey dots. The mean and median frequencies are shown as gray and black lines, respectively; the peach colored region shows the frequency interval containing the central 94% of the plotted SNP frequencies in each population. SNPs were selected so that each plot includes at most one SNP from clusters of high- FST SNPs (Methods).
Figure 4
Figure 4. Global allele frequencies and haplotype patterns at three genes with signals of positive selection.
The left-hand column shows pie charts of allele frequencies (blue ancestral, red derived) across the HGDP populations for: (A) a SNP upstream of KITLG (rs1881227); and for nonsynonymous SNPs in (B) SLC24A5 (rs1426654; data from [18]), and (C) MC1R (rs885479). The right-hand column shows a representation of haplotype patterns for 500 kb around each gene, in each case centered on the SNP displayed in the pie charts. Each box represents a single population, and observed haplotypes are plotted as thin horizontal lines, using the same haplotype coloring for all populations (see Methods and [59]). In all three cases the derived allele plotted in the pie charts is found mainly on the red haplotype.
Figure 5
Figure 5. Derived allele frequencies of SNPs with extreme frequency differences between pairs of HapMap populations.
In each plot, each red or blue line indicates the derived allele frequencies of a single SNP in the HapMap YRI, CEU, and ASN population groups. The plots show SNPs with extreme frequency differences (>90%) between each pair of HapMap groups: YRI–ASN (left), YRI–CEU (middle), CEU–ASN (right). The data are for Perlegen Type A SNPs genotyped in HapMap. The red lines show alleles that have high derived frequency in the first population and the upper number on each plot indicates the total number of such SNPs; the blue lines and lower numbers are for alleles that are at high frequency in the second population.
Figure 6
Figure 6. The distribution of XP-EHH, a measure of haplotype homozygosity, at high- FST SNPs in east Asians.
The solid line shows the distribution of XP-EHH in the ASN population at SNPs with a frequency difference >90% between the ASN and YRI samples. For comparison, we plot the XP-EHH distribution both for SNPs randomly chosen from the HapMap and for simulated SNPs with a selective advantage of 1%. These analyses used the full HapMap data, but choosing only one high- FST SNP in genomic regions where there are clusters of high- FST SNPs (see Methods). Simulations applied the cosi demographic model with minor modifications [7, Methods]. SNPs simulated with selection were included if there was a frequency difference >90% between ASN and YRI and where the derived allele is at high frequency in ASN. Density curves were obtained using the default settings of the density function in R .
Figure 7
Figure 7. Average allele frequency trajectories of selected alleles, as a function of the strength of selection.
The lines plot the mean trajectories of codominant alleles, starting from frequency formula image at time 0, conditional on the alleles not being lost within 4000 generations. Simulations were performed under an effective population size of 24,000 chosen to match the effective population size of the ‘Yoruba’ in cosi . To provide some context, the bars at the top indicate the divergence times of the HapMap Europeans and Asians, and HapMap Africans and non-Africans according to the cosi model , though it should be noted that there is considerable uncertainty in the true split times. The numbers in parentheses indicate times in years, assuming 20 years per generation.
Figure 8
Figure 8. Population bottlenecks can simultaneously increase both the rate of loss and the rate of fixation of favored alleles.
Trajectories of favored variants were simulated according to demographic models for the (A) Yoruba, and (B) East Asian populations . In each simulation the selected variant was introduced 4000 generations before the present (∼80 KYA), i.e., prior to the out-of-Africa event. The plots show heat maps of the distributions of frequencies at each time, conditional on the allele not being lost by the present day (time = 0). The timing of bottleneck events in the model are indicated by vertical grey bars in the ASN population. Redder shades indicate a higher density of selected mutations in a particular frequency bin. The black lines indicate the mean frequencies and the grey lines bracket the central 95% of the frequency distributions. The histograms on the right show the frequency spectrum of favored mutations in the present day, for each population, excluding mutations at frequency 0. The area of each histogram is proportional to the fraction of selected alleles that have frequency >0 in the present. Notice from the histograms that a much larger fraction of favored alleles survive to the present under the YRI demography, even though the fraction of alleles that are near fixation is much smaller in the YRI.

Similar articles

See all similar articles

Cited by 154 PubMed Central articles

See all "Cited by" articles


    1. Sabeti P, Schaffner S, Fry B, Lohmueller J, Varilly P, et al. Positive natural selection in the human lineage. Science. 2006;312:1614–1620. - PubMed
    1. Volkman S, Sabeti P, DeCaprio D, Neafsey D, Schaffner S, et al. A genome-wide map of diversity in Plasmodium falciparum. Nat Genet. 2007;39:113–119. - PubMed
    1. Begun D, Holloway A, Stevens K, Hillier L, Poh Y, et al. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 2007;5:e310. - PMC - PubMed
    1. Clark R, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. Common sequence poly-morphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–342. - PubMed
    1. Stringer C, Andrews P. Genetic and fossil evidence for the origin of modern humans. Science. 1988;239:1263–1268. - PubMed

Publication types