Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 108 (13), 5154-62

Hunter-gatherer Genomic Diversity Suggests a Southern African Origin for Modern Humans

Affiliations

Hunter-gatherer Genomic Diversity Suggests a Southern African Origin for Modern Humans

Brenna M Henn et al. Proc Natl Acad Sci U S A.

Abstract

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by F(ST), in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

Conflict of interest statement

Conflict of interest statement: The authors from 23andMe, Inc. (C.R.G., J.M.M., L.H., and J.L.M.) declare competing financial interests as employees at and stock holders of 23andMe, Inc. SNP arrays designed by 23andMe were used to generate a unique dataset reported in this article. To our knowledge, affiliation with 23andMe, Inc. did not bias the results or discussion of results reported in this article.

Figures

Fig. 1.
Fig. 1.
Ancestral population clusters in sub-Saharan Africa. An unsupervised clustering algorithm, ADMIXTURE (21), was used to analyze population structure among 12 sub-Saharan African populations using ≈461K autosomal SNP loci. We plot k: 2, 4, 6, 8 ancestral populations. European Tuscans were included to allow for potential recent European admixture in South Africans. We randomly chose a subset of 30 unrelated Maasai and Luhya for representation in this figure. At k = 4, all HG retain shared ancestry (in blue), and South African Bantu-speakers are likely to have recently absorbed 10–20% KhoeSan ancestry. At k = 8, HG populations emerge with four distinct, ancestral population clusters.
Fig. 2.
Fig. 2.
Genome-wide LD and Fst in African populations. (A) Each line represents the LD decay averaged across populations within each of six geographic regions; regions are described in SI Appendix, Table S1. LD (r2) between SNPs was calculated in sliding 1-Mb windows. The r2 estimates were binned by the genetic distance between SNPs in 5-Kb bins. HG populations have the lowest LD curves (SI Appendix, Fig. S13 shows population-specific LD decay curves). (B) We assessed a possible point of origin by regressing LD onto geographic distance. Regression for the single best fit for geographic origin is shown, with a correlation coefficient of 0.78, to the point 14°S, 12°E. (C) Map is shown using mean LD within 0–5 Kb. The highest correlation coefficient in blue indicates the best fit with a potential geographic origin. Crosses indicate the geographic coordinates of the sampled populations. (D) We assessed a possible point of origin by regressing Fst between African populations and Europe onto geographic distance, using a waypoint in the Near East. Populations were grouped for Fst analysis; crosses indicate the geographic mean for each population grouping.
Fig. 3.
Fig. 3.
Local ancestry assignment along phased chromosomes. Two individuals with potential admixture (Fig. 1) were projected onto the principal component space of three putative ancestral populations. The three ancestral populations differed for Sandawe individuals (SWE, Sandawe; HAD, Hadza; LWK, Luhya Bantu) and South African ≠Khomani Bushmen (SAN, KhoeSan; TSI, European Tuscan; LWK, Luhya Bantu). Ancestry was assigned in 40-SNP windows along phased chromosomes (haplotypes A and B) by calculating the minimal distance to an ancestral population (28). Ancestry from Bantu-speaking agriculturalists seems to have occurred relatively recently, as indicated by many ≥10-Mb segments. Switch errors in the phasing could potentially shorten the length of these migrant tracts, but with low levels of admixture, phase switch errors are less likely to lengthen inferred migrant tracts.
Fig. 4.
Fig. 4.
Runs of homozygosity among Khoisan-speakers. (A) Long runs of homozygosity were calculated for individuals in the Hadza, Sandawe, and ≠Khomani Bushmen populations. Runs were constrained to a minimum of 1 Mb, and two missing genotypes were allowed per run. cROH are plotted for all individuals; the y-axis represents counts of individuals. The Hadza distribution differs markedly from the other two populations, with 65% (n = 11/17) of individuals having cROH >100 Mb. This distribution is consistent with a severe, recent bottleneck in the Hadza. (B) Simulated posterior distribution of effective population size in the Hadza, generated by sampling from a uniform distribution of Ne and keeping simulated parameters within 20% of the observed fROH with REJECTOR (29). (C) Simulated posterior distribution of bottleneck severity in the Hadza, as modeled above.

Comment in

Similar articles

See all similar articles

Cited by 132 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback