Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Nov 18;4(11):e7888.
doi: 10.1371/journal.pone.0007888.

Genetic Variation and Recent Positive Selection in Worldwide Human Populations: Evidence From Nearly 1 Million SNPs

Free PMC article

Genetic Variation and Recent Positive Selection in Worldwide Human Populations: Evidence From Nearly 1 Million SNPs

David López Herráez et al. PLoS One. .
Free PMC article


Background: Genome-wide scans of hundreds of thousands of single-nucleotide polymorphisms (SNPs) have resulted in the identification of new susceptibility variants to common diseases and are providing new insights into the genetic structure and relationships of human populations. Moreover, genome-wide data can be used to search for signals of recent positive selection, thereby providing new insights into the genetic adaptations that occurred as modern humans spread out of Africa and around the world.

Methodology: We genotyped approximately 500,000 SNPs in 255 individuals (5 individuals from each of 51 worldwide populations) from the Human Genome Diversity Panel (HGDP-CEPH). When merged with non-overlapping SNPs typed previously in 250 of these same individuals, the resulting data consist of over 950,000 SNPs. We then analyzed the genetic relationships and ancestry of individuals without assigning them to populations, and we also identified candidate regions of recent positive selection at both the population and regional (continental) level.

Conclusions: Our analyses both confirm and extend previous studies; in particular, we highlight the impact of various dispersals, and the role of substructure in Africa, on human genetic diversity. We also identified several novel candidate regions for recent positive selection, and a gene ontology (GO) analysis identified several GO groups that were significantly enriched for such candidate genes, including immunity and defense related genes, sensory perception genes, membrane proteins, signal receptors, lipid binding/metabolism genes, and genes involved in the nervous system. Among the novel candidate genes identified are two genes involved in the thyroid hormone pathway that show signals of selection in African Pygmies that may be related to their short stature.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Comparison of the observed heterozygosity per individual for Illumina vs. Affymetrix genotypes.
(A) For all SNPs genotyped on each platform. (B) For only those SNPs present on both platforms. (C) Box-and-whisker plot of heterozygosity values for each group, based on the entire dataset of 954,063 SNPs. (D) Box-and-whisker plot of heterozygosity values for each group, based on a pruned dataset of 220,247 SNPs in which SNPs in high LD were removed.
Figure 2
Figure 2. Plot of PC1 vs. PC2 for the 250 HGDP-CEPH individuals using the combined, non-overlapping set of 954,063 SNPs from the Affymetrix and Illumina platforms.
Population labels are abbreviated to the first 3 letters of the population name. Regional labels are: AM = Americas, CSA = Central/South Asia, EA = East Asia, EUR = Europe, ME = Middle East, OC = Oceania (Melanesia), SSA = Sub-Saharan Africa.
Figure 3
Figure 3. Information from additional PCs.
(A) Amount of variation explained and associated statistical significance, based on the TW statistic, for the first 15 PCs. (B) Heat plot of the value of each of the first 15 PCs for each of the 51 populations. The PC values have been normalized for each PC to range from 0 to 1.
Figure 4
Figure 4. frappe results for K = 6.
Each color indicates a different ancestry component.
Figure 5
Figure 5. Plot of candidate regions for recent local selection on chromosome 2.
Each horizontal row is a population (A) or regional group (B), with abbreviations for population names given in Table S1. The numbers across the top indicate the position (in megabases) along the chromosome, and each light box indicates a candidate region of recent positive selection. The vertical red lines indicate the position of the EDAR and LCT genes. (A) For all 51 populations. (B) For six regional groups of populations (excluding sub-Saharan Africa).
Figure 6
Figure 6. Candidates for recent local selection shared by two or more regional groupings of populations.
Each row is a candidate region for recent positive selection, and each column is a regional grouping of populations. The darkness of the bar in each entry indicates the rank of that candidate region within the top 100 signals for that region (ordered by p-value), according to the scale bar at the bottom.
Figure 7
Figure 7. Signals of selection for thyroid hormone pathway genes in African Pygmies.
(A) 1 Mb region around TRIP4 in Mbuti Pygmies. (B) 500 Kb region around IYD in Biaka Pygmies. In each panel, the top part is the region surrounding the candidate gene with the position of genes indicated, followed by the distribution of the iES, RsbAR, and Sweepfinder statistics, followed by a diagram of the haplotypes observed in either Mbuti (A) or Biaka (B) Pygmies, where each row is a haplotype, each column is a SNP, and the light vs. dark shading indicates the alternative alleles for each SNP.

Similar articles

See all similar articles

Cited by 54 articles

See all "Cited by" articles


    1. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. - PubMed
    1. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–1320. - PMC - PubMed
    1. The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, et al. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15:1576–1583. - PMC - PubMed

Publication types