Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 6, 20768

Genomic Study of the Ket: A Paleo-Eskimo-related Ethnic Group With Significant Ancient North Eurasian Ancestry

Affiliations

Genomic Study of the Ket: A Paleo-Eskimo-related Ethnic Group With Significant Ancient North Eurasian Ancestry

Pavel Flegontov et al. Sci Rep.

Abstract

The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal'ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, identified mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that Nganasans, Kets, Selkups, and Yukaghirs form a cluster of populations most closely related to Paleo-Eskimos in Siberia (not considering indigenous populations of Chukotka and Kamchatka). Kets are closely related to modern Selkups and to some Bronze and Iron Age populations of the Altai region, with all these groups sharing a high degree of Mal'ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.

Figures

Figure 1
Figure 1
(A) Admixture coefficients plotted for dataset ‘GenoChip + Illumina arrays’. Abbreviated names of admixture components are shown on the left as follows: SAM, South American; NAM, North American; ESK, Eskimo (Beringian); SEA, South-East Asian; SIB, Siberian; NEU, North European; ME, Middle Eastern; CAU, Caucasian; SAS, South Asian; OCE, Oceanian; AFR, African. The Ket-Uralic (‘Ket’) admixture component appears at K ≥ 11, and admixture coefficients are plotted for K = 4, 10, 11, and 19. Although K = 20 demonstrates the lowest average cross-validation error, the Ket-Uralic component splits in two at this K value, therefore K = 19 was chosen for the final analysis. Only populations containing at least one individual with >5% of the Ket-Uralic component at K = 19 are plotted, and individuals are sorted according to values of the Ket-Uralic component. Admixture coefficients for the Saqqaq ancient genome are shown separately on the right, and for two reference Kets and two Ket individuals from this study - on the left. (B) Average cross-validation (CV) error graph with standard deviations plotted. Ten-fold cross-validation was performed. The graph has a minimum at K = 20. (C) Color-coded values of the Ket-Uralic admixture component at K = 19 plotted on the world map using QGIS v.2.8. Maximum values in each population are taken, and only values >5% are plotted. Top five values of the component are shown in the bottom left corner, and the value for Saqqaq is shown on the map.
Figure 2
Figure 2
(A) A maximum likelihood tree with 6 migration edges computed on the dataset ‘Ket genomes + HumanOrigins’ with selected populations (194,750 SNPs, 39 populations, 527 individuals). Drift parameter is shown on the x-axis. (B) Residuals from the fit of the model to the data visualized. 98% of variance is explained by the tree.
Figure 3
Figure 3
(A) A maximum likelihood tree with 7 migration edges computed on the genome-based dataset without transitions. Edge weight and bootstrap support values are shown in the table, the drift parameter is shown on the x-axis, and bootstrap support values for tree nodes are indicated. Migration edges are numbered according to the order of their appearance in the sequence of trees from m = 0 to m = 8. Note to the figure: as migration edges and tree topology are inter-dependent in bootstrapped trees, bootstrap support for the edges in the original tree was calculated by summing up support for closely similar edges in bootstrapped trees. Below these edge groups are listed for edges #1–7: 1/ Australian and/or Papuan formula image the (Nivkh, Han, Dai, Kinh) clade or any of its members; 2/ Greenlander Inuit or the (Greenlander, Aleutian) clade formula image Saqqaq and/or Late Dorset (optionally a wider clade with Nivkh); 3/ any clade containing African populations formula image any clade composed of Nivkh/Han/Dai/Kinh (optionally a wider clade with Late Dorset and/or Saqqaq and/or Iron Age Altai); 4/ any clade composed of Mal’ta/Afanasievo/Andronovo (optionally a wider clade with Aleutian and/or Mari) formula image Karasuk; 5/ Mal’ta (optionally a wider clade with Motala12/Afanasievo/Andronovo/Aleutian) formula image any clade composed exclusively of Native Americans and/or Greenlander; 6/ any clade composed exclusively of populations with European ancestry formula image Aleutian; 7/ Ket (optionally a wider clade with Karasuk and/or Iron Age Altai and/or Iron Age Russia) formula image Saqqaq and/or Late Dorset. (B) Residuals from the fit of the model to the data visualized. 96.72% variance is explained by the tree.
Figure 4
Figure 4
(A) PC3 vs. PC4 plot for the dataset ‘Ket genomes + HumanOrigins array’. African populations are not shown. Populations are color-coded by geographic region or language affiliation (in the case of Siberian and Central Asian populations), and most relevant populations are differentiated by marker shapes. Ancient genomes are shown in black. For the corresponding PC1 vs. PC2 plot see Suppl. Fig. 6.7. (B) PC3 vs. PC4 plot, zoom on the Ket individuals. Here is a list of populations closest to Saqqaq based on the average Euclidean distances in the multi-dimensional space of ten principal components (distances in parentheses): Ket (0.022), Nganasan (0.025), Selkup (0.026), Yukaghir (0.028), Eskimo (0.032), Koryak (0.032), Mansi (0.032), Itelmen (0.033), Chukchi (0.033), Dolgan (0.035).
Figure 5
Figure 5
Statistics f4 (Mal'ta, Yoruba; Y, X) (A), f4 (Ket, Yoruba; Y, X) (B) computed on the genome-based dataset with African, Australian and Papuan populations excluded. See the corresponding results for the dataset without transitions in Suppl. Figs 8.15 and 8.16, respectively. A matrix of color-coded Z-scores is shown, and ancient genomes are marked with asterisks. Z-score equals the number of standard errors by which the statistic differs from zero, and |Z| > 2.9 demonstrates that the statistic is significantly different from zero using Bonferroni correction for 27 independent tests (threshold p-value of 0.001852). Rows show Z-scores for f4 (Mal'ta, Yoruba; row, column) or f4 (Ket, Yoruba; row, column), vice versa for columns.

Similar articles

See all similar articles

Cited by 11 PubMed Central articles

See all "Cited by" articles

References

    1. Vajda E. J. Ket. Languages of the World/Materials Volume 204. Munich: Lincom Europa (2004).
    1. Vajda E. J. Loanwords in Ket. The Typology of Loanwords, ed. Haspelmath, M., Tadmoor, U. Oxford: Oxford University Press, 125–139 (2009).
    1. Vajda E. J. Yeniseian Peoples and Languages: a History of Their Study with an Annotated Bibliography and a Source Guide. Surrey, England: Curzon Press, 389 p. (2001).
    1. Dul’zon A. P. Ketskie toponimy Zapadnoy Sibiri [Ket toponyms of Western Siberia]. Uchenye Zapisky Tomskogo Gosudarstvennogo Pedagogicheskogo Instituta [Scholarly Proceedings of Tomsk State Pedagogical Institute] 18, 91–111 (1959).
    1. Chlenova N. L. Sootnoshenie kul’tur karasukskogo tipa i ketskikh toponimov na territorii Sibiri [The correlation between Karasuk-type cultures and Ket toponyms in Siberia]. Etnogenez i Etnicheskaya Istoriya Narodov Severa [Ethnogenesis and History of the Peoples of the North]. Moscow: Nauka, 223–230 (1975).

Publication types

Substances

Feedback