Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 28 (2), 1013-24

Population Genetic Structure in Indian Austroasiatic Speakers: The Role of Landscape Barriers and Sex-Specific Admixture

Affiliations

Population Genetic Structure in Indian Austroasiatic Speakers: The Role of Landscape Barriers and Sex-Specific Admixture

Gyaneshwer Chaubey et al. Mol Biol Evol.

Abstract

The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and "structure-like" analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components-one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.

Figures

Fig. 1
Fig. 1
(A) Language tree of the major subgroups of the Austroasiatic (AA) language family according to Diffloth (2009). The branching of the hypothetical extinct para-Munda languages Melluha and Kubha-Vipas is shown by a broken line. The branching pattern of the extant languages allows for both south and southeast Asia to be considered equally as potential homelands for the initial spread of AA. According to Fuller (2007), the acceptance of the extinct para-Munda branch would support the origin of AA in the Indian subcontinent. The map depicts the geographic distribution of the AA family (adopted from Diffloth 2001 and Anderson 2007 covering southeast Asia and India respectively) and the sampling locations (with the precision of district) for the Indian AA samples. Numbers correspond to populations as given in table 1. Note, that for India, only the concentrated AA regions are highlighted. Munda speakers can be found in low frequencies throughout east India, thus the few sampling locations outside the shown AA areas still represent AA populations. (B) Out of southeast Asia and (C) out of India dispersal models. These two models represent two alternative views to explain the spread of AA-speaking populations, all sharing rice domestication related vocabulary, in south and southeast Asia. According to model B, the AA family originated in southeast Asia. This model requires only one domestication event of rice in East Asia. In contrast, model C implies the origin of the AA family and its initial split in India. According to this model, Oryza indica and Oryza japonica rice were independently domesticated in what today are India and China. Recent gene flow between local Indian (Ind) non-AA groups and Munda speakers (Mun) in model B and between Khasi-Aslian (Kh-As) and local East Asian (EAs) derived populations is indicated by broken lines. Depending on the extent of the recent admixture, model B allows for preservation of some southeast Asian genetic ancestry among Munda, whereas no distinguishable Indian contribution is expected among Khasi-Aslian groups of southeast Asia. Conversely, model C assumes continuity of Munda groups in India with no specific east Asian contribution to their genes (apart from secondary gene flow from local Tibeto-Burman groups of India), whereas Khasi-Aslian would be expected to represent admixture between populations derived from the Indian subcontinent and southeast Asia.
Fig. 2
Fig. 2
Scatter plot, showing southeast Asian–specific lineages among different linguistic groups of India. The geographical distribution of Munda languages in India is mainly governed by longitudinal distances; therefore, frequencies of Y chromosome (left panel) and mtDNA (right panel) haplogroups are plotted against longitudinal distances (x axis). Mushar and Tharu (who now speaks Indo-European language and showing exceptional levels of east Asian haplogroups in contrast to their linguistic affiliation) are arrow marked. South Asian haplogroups—mtDNA: M2–6, N5, M33–65, R5–8, and R31–32; Y chromosome: C5, F, H, L, and R2. Southeast Asian haplogroups—mtDNA: A–G, M7–12, R22, and N9; Y chromosome: C2, C3, D, and M–O. Unresolved haplogroups—mtDNA: M*, R*, N* including other lineages, for example, M31 and West Eurasian specific; Y chromosome: C*, G, I–K*, P*, Q, and R1. Haplogroup frequencies and associated references are given in detail in supplementary information (supplementary tables S9 and S10, Supplementary Material online).
Fig. 3
Fig. 3
(A) PCA of Indian Austroasiatic, Dravidian, and Tibeto-Burman groups in the context of other Eurasian populations. PC analysis was carried out using smartpca program (with default settings) of the EIGENSOFT package. After filtering SNPs (see Materials and Methods for detail), the combined data set yielded a matrix of 615 samples with 189,533 SNPs. (B) Bar plot displays individual ancestry estimates for studied populations from a structure analysis by using ADMIXTURE with K = 7.
Fig. 4
Fig. 4
(A) Geographic distribution of the EDAR 1540C allele frequency worldwide. The map was generated using Surfer8 of Golden Software (Golden Software Inc.), following the Kriging procedure. Red dots indicate sampling location. (B) Geographic distribution of the EDAR 1540C allele frequency in different groups of south and southeast Asia. The frequency is shown in proportion to the bubble size.
Fig. 5
Fig. 5
Surfer maps showing (A) the frequency and (C) the mean microsatellite variance distributions of haplogroup O2a (M95) in south and southeast Asia. Surfer maps were generated using Surfer8 of Golden Software (Golden Software Inc.), following the Kriging procedure. (B) Phylogenetic network relating Y-STR haplotypes within haplogroup O2a (M95). The network was constructed using a median joining with MP (maximum parsimony) algorithm as implemented in the Network 4.5.0.2 program. The size of the circles is proportional to the number of samples.

Similar articles

See all similar articles

Cited by 54 PubMed Central articles

See all "Cited by" articles

Publication types

Substances

Feedback