Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 150 (3), 457-69

Evolutionary History and Adaptation From High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers


Evolutionary History and Adaptation From High-Coverage Whole-Genome Sequences of Diverse African Hunter-Gatherers

Joseph Lachance et al. Cell.


To reconstruct modern human evolutionary history and identify loci that have shaped hunter-gatherer adaptation, we sequenced the whole genomes of five individuals in each of three different hunter-gatherer populations at > 60× coverage: Pygmies from Cameroon and Khoesan-speaking Hadza and Sandawe from Tanzania. We identify 13.4 million variants, substantially increasing the set of known human variation. We found evidence of archaic introgression in all three populations, and the distribution of time to most recent common ancestors from these regions is similar to that observed for introgressed regions in Europeans. Additionally, we identify numerous loci that harbor signatures of local adaptation, including genes involved in immunity, metabolism, olfactory and taste perception, reproduction, and wound healing. Within the Pygmy population, we identify multiple highly differentiated loci that play a role in growth and anterior pituitary function and are associated with height.


Figure 1
Figure 1. Genomic variation in African hunter-gatherers and other global populations
A) Hunter-gatherer populations sequenced in our study (five males per population). HapMap abbreviations of publicly available genomes are also listed. A and B) Numbers indicate how many variants belong to each subset of populations. C and D) Principal component analysis of 68 high-coverage genomes. Pygmy genomes are indicated by green, Hadza by blue, Sandawe by red, and non-hunter-gatherer genomes by gray circles. E) Neighbor joining tree based on pairwise identity-by-state matrix distances using high-coverage whole-genome sequences from 68 individuals. See also Table S1, Table S2, Figure S2, and Figure S4.
Figure 2
Figure 2. Times until most recent common ancestry and evidence of archaic introgression
A) TMRCA of top candidate regions (solid lines), and of all regions (dotted lines) for the Pygmy, Hadza, and Sandawe hunter-gatherer populations and two European populations. Note, TMRCA represents the estimated time of divergence between the anatomically modern human and candidate introgressed sequences (Supplemental Information). TMRCA for top candidate regions is significantly older than random genomic regions (Kruskal-Wallis test, p < 2.2×10−16), but TMRCA for top candidate regions from each population are not significantly different (Kruskal-Wallis test, p=1). B and C) STRUCTURE plots showing the proportion of ancestry for each individual based on the most likely number of subpopulations (K = 2 for putatively introgressed regions in Panel B and K =3 for random regions in Panel C). For each population, a ‘virtual’ genome was constructed by concatenating sequence from individuals containing the putatively introgressed sequence (B) or from arbitrary individuals (C). Pi, Hi, and Si denote the virtual genomes constructed for the Pygmy, Hadza, and Sandawe samples, respectively. D) TMRCA of top candidate regions for introgression unique to a single hunter-gatherer population is significantly lower than TMRCA of regions shared between all hunter-gatherer populations (Wilcoxon rank sum test, p = 2.2×10−5). E) Genomic distribution of the top 350 introgressed regions for the Pygmy, Hadza, and Sandawe populations and two European populations, in 2Mb windows. Colors indicate whether windows contain introgressed regions from a single hunter-gatherer population (orange), multiple hunter-gatherer populations (blue), or hunter-gatherer and European populations (open black circle). Counts are for hunter-gatherer regions only. See also Figure S7.
Figure 3
Figure 3. Characteristics of S* in real and simulated data
A–B) Neanderthal variants are not enriched in top candidate regions for three hunter-gatherer populations (A), but are enriched in top candidate regions from two European populations (B). C) TMRCA estimates for top 0.5% of 50kb regions in simulated data, varying time of split with the archaic population from 300kya to 1000kya; introgression was simulated into Europeans (white boxes) and Yorubans (gray boxes). See also Figure S7.
Figure 4
Figure 4. Divergent genomic regions between hunter-gatherers and non-hunter-gatherers and genomic distributions of ancestry informative markers (AIMs)
Each dot represents a non-overlapping 100kb window. Colors correspond to different chromosomes. For each population, genes found in the top 10 windows are listed in bold. If no genes are present in a top 10 window, the nearest gene is listed in normal font. A-C) number of LSBL outliers (top 1%) per 100kb window. D-F) number of AIMs per 100kb window. See also Table S5 and Table S6.
Figure 5
Figure 5. Pygmy AIMs, allele frequencies, and height associations
A) Pygmy AIMs (green lines) located on chromosome 3. Green shading in the LSBL plot indicates genomic regions with an excess of LSBL outliers near the 3p14.3 and 3p11.2 AIM clusters. B) Allele frequencies of Pygmy AIMs genotyped in a broad sample of African Pygmy and Bantu individuals. Significant associations (p < 0.05) with height are indicated for males (green asterisks) and for both sexes pooled together (black asterisks). Sample sizes are also listed (n=the number of genotyped individuals per population). See also Table S7.

Similar articles

See all similar articles

Cited by 95 articles

See all "Cited by" articles

Publication types

LinkOut - more resources