The Genetic Legacy of the Expansion of Turkic-speaking Nomads Across Eurasia


The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language's expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. In agreement with the elite dominance model of language expansion most of the Turkic peoples studied genetically resemble their geographic neighbors. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. While SSM matching IBD tracts (> 1cM) are also observed in non-Turkic populations, Turkic peoples demonstrate a higher percentage of such tracts (p-values ≤ 0.01) compared to their non-Turkic neighbors. Finally, we used the ALDER method and inferred admixture dates (~9th-17th centuries) that overlap with the Turkic migrations of the 5th-16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. Geographic map of samples included in this study and linguistic tree of Turkic languages.
Panel A) Non-Turkic-speaking populations are shown with light blue, light green, dark green, light brown, and yellow circles, depending on the region. Turkic-speaking populations are shown with red circles regardless of the region of sampling. Full population names are given in S1 Table Panel B) The linguistic tree of Turkic languages is adapted from Dybo 2004 and includes only those languages spoken by the Turkic peoples analyzed in this study. The x-axis shows the time scale in kilo-years (kya). Internal branches are shown with different colors.
Fig 2
Fig 2. Population structure inferred using ADMIXTURE analysis.
ADMIXTURE results at K = 8 are shown. Each individual is represented by a vertical (100%) stacked column indicating the proportions of ancestry in K constructed ancestral populations. Turkic-speaking populations are shown in red. The upper barplot shows only Turkic-speaking populations.
Fig 3
Fig 3. Populations with high and correlated signals of IBD sharing with western Turkic peoples.
Circle positions correspond to population locations. Circle color indicates the amount of excess IBD sharing (shown in Legend) that a population shares with all 12 western Turkic populations. Populations with IBD sharing exceeding the 0.90 quantile are shown with a “plus symbol”. Panel A) IBD sharing signal based on IBD tracts of 1–2 cM. Panel B) IBD sharing signal based on IBD tracts of 2–3 cM. Panel C) IBD sharing signal based on IBD tracts of 3–4 cM
Fig 4
Fig 4. Pairwise IBD sharing based on 1–2 cM long segments.
For each population ordered along the x–axis, IBD sharing is computed with three SSM populations (Tuvans, Buryats, Mongols) and Evenkis. Each Turkic-speaking population (shown in red) is grouped with its respective geographic neighbors using parentheses. The grouped geographic neighbors were pooled and used to perform a permutation test as described in the M&M section. Red numbers under the Turkic population name indicate how many SSM populations demonstrate a statistically significant excess of IBD sharing with a given Turkic population. Note that, for example, Bashkirs, Tatars, and Chuvashes share their geographic neighbors.
Fig 5
Fig 5. Admixture dates for Turkic-speaking populations on an absolute date scale.
Blue circles show ALDER-inferred point estimates and error bars indicate 95% confidence intervals. Gray circles show SPCO-inferred point estimates and error bars in gray indicate 95% confidence intervals. The red bar shows the point estimate range (inferred using ALDER) across all the analyzed samples and the orange bar shows the same for SPCO-inferred dates. Admixture dates before Common Era (CE) are shown with a negative sign.
Fig 6
Fig 6. Admixture dates for simulated populations.
Simulated populations were generated by mixing two ancestral populations G generations ago as described in the M&M section. We repeated each admixture scenario 120 times and analyzed with two admixture dating methods: ALDER and SPCO. Circles represent admixture dates for one simulated population and circle color indicates the method of admixture inference as shown in the legend. Red “plus symbols” show the true admixture date.

This work was supported by European Union European Regional Development Fund through the Centre of Excellence in Genomics for the Estonian Biocentre and the University of Tartu, by the Estonian Institutional Research grant IUT24-1, by the European Commission grant 205419 ECOGENE to the EBC, by the Estonian Science Foundation grant nr8973, and by the Estonian Basic Research Grant SF 0270177s08; Russian Federation President Grant for young scientists (MK-2845.2014.4) to BY; the Russian Academy of Sciences Program for Fundamental Research "Biodiversity and dynamics of gene pools" to EK; the Federal Agency of Education and Science of the Russian Federation (state contracts 02.740.11.0701 and P325 to EK); the Russian Foundation for Basic Research (grant number 11-04-00652_a to EK); the Russian Foundation for Humanities (grant number 13-11-02014/U to EK); Committee for Coordination Science and Technology Development of Republic of Uzbekistan (grant number FA-A6-T180 to ST); EGC-UT received targeted financing from Estonian Government SF0180142s08, Center of Excellence in Genomics (EXCEGEN), and University of Tartu (SP1GVARENG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.