, 5, 15486

Y-chromosome Diversity Suggests Southern Origin and Paleolithic Backwave Migration of Austro-Asiatic Speakers From Eastern Asia to the Indian Subcontinent


Xiaoming Zhang et al. Sci Rep.


Analyses of an Asian-specific Y-chromosome lineage (O2a1-M95)--the dominant paternal lineage in Austro-Asiatic (AA) speaking populations, who are found on both sides of the Bay of Bengal--led to two competing hypothesis of this group's geographic origin and migratory routes. One hypothesis posits the origin of the AA speakers in India and an eastward dispersal to Southeast Asia, while the other places an origin in Southeast Asia with westward dispersal to India. Here, we collected samples of AA-speaking populations from mainland Southeast Asia (MSEA) and southern China, and genotyped 16 Y-STRs of 343 males who belong to the O2a1-M95 lineage. Combining our samples with previous data, we analyzed both the Y-chromosome and mtDNA diversities. We generated a comprehensive picture of the O2a1-M95 lineage in Asia. We demonstrated that the O2a1-M95 lineage originated in the southern East Asia among the Daic-speaking populations ~20-40 thousand years ago and then dispersed southward to Southeast Asia after the Last Glacial Maximum before moving westward to the Indian subcontinent. This migration resulted in the current distribution of this Y-chromosome lineage in the AA-speaking populations. Further analysis of mtDNA diversity showed a different pattern, supporting a previously proposed sex-biased admixture of the AA-speaking populations in India.


Figure 1
Figure 1. Geographic locations of the studied populations in Asia that contain the O2a1-M95 lineage.
Populations are color-coded based on their language families. The figure was modified from our previous report using Microsoft Powerpoint 2011 (Microsoft Corporation, USA).
Figure 2
Figure 2
Frequency distribution, Uh diversity and phylogenetic structure of the O2a1-M95 lineages among Asian populations. Contour map shows the frequency (A) and Y-STRs Uh diversity (B) of lineage O2a1-M95 in Asia. Colored dots indicate the geographic locations of the analysed populations that correspond with Fig. 1; Bars indicate the frequency and Uh diversity spectrum respectively. (C) Phylogenetic network of Y-STRs haplotypes among O2a1-M95 populations generated from the following 14 Y-STRs: DYS19, DYS389 I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS458, DYS635 and GATA H4; Circles size is proportional to the number of samples. The contour maps were generated using Surfer10 (Golden Software Inc., Golden, USA), and the network was constructed using the Network package (
Figure 3
Figure 3. Comparison of coalescence ages of the O2a1-M95 lineages among diffenent geographic populations.
The age of each geographic or linguistic group was calculated by taking the average of respective populations from supplementary Table S3.
Figure 4
Figure 4. NJ-tree constructed of Y-STRs variations among different language family populations.
Different linguistic families are shown using different colors. Branch length values are indicated above the branch.
Figure 5
Figure 5. Map of principal component analysis (PCA) among Asian populations.
Populations of East Asia and South Asia were grouped respectively by geograpghic region and language family. AA and TB-speaking populations closely clustered with DR anf IE populations in the lower left. The first and the second components explain 15.25% and 7.10% of the genetic variance, respectively.

Cited by 2 PubMed Central articles


