Phylogeography of mtDNA Haplogroup R7 in the Indian Peninsula


Gyaneshwer Chaubey et al. BMC Evol Biol.


Background: Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.

Results: Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of approximately 12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.

Conclusion: Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.


Figure 1
Figure 1
The most parsimonious tree of haplogroup R7 complete mtDNA sequences observed in the Indian subcontinent. This tree was redrawn manually from the output of median joining/reduced network obtained using NETWORK program (version 4.1) [34] The samples were selected through a preliminary sequence analysis of the control region in order to include the widest possible range of R7 variation, language and geographical groups. Coalescent times were calculated by a calibration method described elsewhere [32]. 16182C, 16183C and 16519 polymorphisms were omitted. Suffixes A, C, G, and T indicate transversions, recurrent mutations are underlined. Synonymous (s) and non-synonymous (ns) mutations are distinguished. DRA-Dravidian, AA-Austro-Asiatic, IE-Indo-European. The ethnic affiliation of the samples is as follows: Lam, Lambadi; As, Asur; Mw, Mawasi; Tor45, Pakistan; Ho, Ho; Ori&A, Oraon; G19, Kanwar; G39, Santhal; G66, Gond; KO, Koya. Two sequences, T35 (Thogataveera) and C35 (Brahmin), were taken from the literature [4].
Figure 2
Figure 2
Principal component (PC) analysis of R5-8, R30 and R31 lineages in Indian populations. Munda group and a few Indo-European/Dravidian populations collected from Bihar, Jharkhand and Chhattisgarh states, predominantly cluster with haplogroup R7. Haplogroup frequencies were obtained from published sources [14] and our unpublished data.
Figure 3
Figure 3
The reduced-median network of 152 mtDNAs belonging to haplogroup R7. Each sample represented on the diagram has been sequenced for the HVS-I region and genotyped for the coding region mutations that are indicated. Circle sizes are proportional to the number of mtDNAs with that haplotype. Recurrent mutations are underlined.
Figure 4
Figure 4
The frequency distribution of R7a and R7b clades in Indian subcontinent. The upper panel (a, b) shows the spatial distribution (%) of these clades in Indian populations. Isofrequency maps were generated by using Surfer7 of Golden Software (Golden Software Inc., Golden, Colorado), following the Kriging procedure. These isofrequency maps illustrate the geographic spread of the respective mtDNA haplogroups. It should be cautioned, however, that these illustrative maps should not be used to predict the frequency of the clade in geographical areas with missing data. The lower panel (c, d) depicts the frequencies of R7a and R7b in different social and language groups. DRA-Dravidian, AA-Austro-Asiatic, IE-Indo-European.
Figure 5
Figure 5
The frequency distribution of haplogroup R7 in different branches of the Austro-Asiatic language family of India[26].

