Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 7 (1), 1572

Human Ancestry Correlates With Language and Reveals That Race Is Not an Objective Genomic Classifier


Human Ancestry Correlates With Language and Reveals That Race Is Not an Objective Genomic Classifier

Jennifer L Baker et al. Sci Rep.


Genetic and archaeological studies have established a sub-Saharan African origin for anatomically modern humans with subsequent migrations out of Africa. Using the largest multi-locus data set known to date, we investigated genetic differentiation of early modern humans, human admixture and migration events, and relationships among ancestries and language groups. We compiled publicly available genome-wide genotype data on 5,966 individuals from 282 global samples, representing 30 primary language families. The best evidence supports 21 ancestries that delineate genetic structure of present-day human populations. Independent of self-identified ethno-linguistic labels, the vast majority (97.3%) of individuals have mixed ancestry, with evidence of multiple ancestries in 96.8% of samples and on all continents. The data indicate that continents, ethno-linguistic groups, races, ethnicities, and individuals all show substantial ancestral heterogeneity. We estimated correlation coefficients ranging from 0.522 to 0.962 between ancestries and language families or branches. Ancestry data support the grouping of Kwadi-Khoe, Kx'a, and Tuu languages, support the exclusion of Omotic languages from the Afroasiatic language family, and do not support the proposed Dené-Yeniseian language family as a genetically valid grouping. Ancestry data yield insight into a deeper past than linguistic data can, while linguistic data provide clarity to ancestry data.

Conflict of interest statement

The authors declare that they have no competing interests.


Figure 1
Figure 1
Ancestry analysis of the global data set. The 282 samples are labeled alternating in the left and right margins. The 21 ancestral components are Kalash (black), Southern Asian (dark goldenrod), South Indian (slate blue), Central African (magenta), Southern African (dark orchid), West-Central African (brown), Western African (tomato), Eastern African (orange), Omotic (yellow), Northern African (purple), Northern European (blue), Southern European (dark olive green), Western Asian (white), Arabian (light gray), Oceanian (salmon), Japanese (red), Southeastern Asian (coral), Northern Asian (aquamarine), Sino-Tibetan (green), Circumpolar (pink), and Amerindian (gray).
Figure 2
Figure 2
(A) The migration graph. TreeMix analysis suggests that migration events occurred between (1) Eastern African and Northern African ancestries; (2) Omotic ancestry and the node leading to Arabian, Northern African, Southern European, and Western Asian ancestries; and (3) Northern European ancestry and the node leading to Amerindian and Circumpolar ancestries. (B) Majority-rule consensus tree. The migration events were suppressed to emphasize the underlying topology.
Figure 3
Figure 3
Correlation of ancestry and language. (A) “Combined” refers to Kwadi-Khoe, Tuu, and Kx’a, previously referred to collectively as Khoisan. (B) “+” indicates the combination of the listed language plus all languages listed to the left. Tupian, Arawakan, Quechumaran, Mayan, and Uto-Aztecan are referred to collectively as Amerind. (C) “Combined” refers to Chukotko-Kamchatkan and Eskimo-Aleut, referred to collectively as Paleo-Siberian. Note that inclusion of Yeniseian worsens the correlation. (D) “Combined” refers to Mongolic, Turkic, and Tungusic, referred to collectively as Altaic.

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles


    1. Groucutt HS, et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. 2015;24:149–164. doi: 10.1002/evan.21455. - DOI - PMC - PubMed
    1. Shriner D, Tekola-Ayele F, Adeyemo A, Rotimi CN. Genome-wide genotype and sequence-based reconstruction of the 140,000 year history of modern human ancestry. Sci. Rep. 2014;4:6055. doi: 10.1038/srep06055. - DOI - PMC - PubMed
    1. Cruciani F, et al. A revised root for the human Y chromosomal phylogenetic tree: the origin of patrilineal diversity in Africa. Am. J. Hum. Genet. 2011;88:814–818. doi: 10.1016/j.ajhg.2011.05.002. - DOI - PMC - PubMed
    1. Poznik GD, et al. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science. 2013;341:562–565. doi: 10.1126/science.1237619. - DOI - PMC - PubMed
    1. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475:493–496. doi: 10.1038/nature10231. - DOI - PMC - PubMed

Publication types