Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 89 (6), 731-44

Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Affiliations

Shared and Unique Components of Human Population Structure and Genome-Wide Signals of Positive Selection in South Asia

Mait Metspalu et al. Am J Hum Genet.

Erratum in

  • Am J Hum Genet. 2012 Feb 10;90(2):378-9

Abstract

South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes.

Figures

Figure 1
Figure 1
Matrix of Pairwise Mean FST Values of Regional Groupings of the Studied Populations Average of intergroup FST values (where the regional group is composed of multiple populations) is given in the diagonal. Central India is itself a composite of two regional groupings of samples from different populations that makes the negative intergroup FST uninformative.
Figure 2
Figure 2
Genome-Wide Structure of the Studied Populations Revealed by 530,000 SNPs (A) principal component analysis of the Eurasian populations. The following abbreviations are used: IE, Indo European speakers; DR, Dravidic speakers; AA, Austroasiatic speakers; TB, Tibeto Burman speakers; , data from Hapmap. (B) ADMIXTURE analysis at K = 8 and 12. The following symbols are used: , contains one Dhurwa; ∗∗, contains one Lambadi; 1, Rajasthan; 2, Chattisgarh and Jharkhand; 3, Chattisgarh, Orissa, and Madhya Pradesh. A.P., Andhra Pradesh; Kar, Karnataka; Ker, Kerala; T. Nadu, Tamil Nadu; #, Nihali language isolate speakers from Maharasthra; §, Tibeto Burman speakers from east Indian states Meghalaya and Nagaland; AA, Austroasiatic languages.
Figure 3
Figure 3
Sharing Signals for Selection between Continental Populations (A) iHS signal sharing between continental populations. The fraction of signals found in the top 1% of iHS scores in population i and the top 5% of population j is given in cell (i,j). Africa refers to Yoruba, Mandenka, and Bantu individuals from the HGDP-CEPH panel. (B) XP-EHH signal sharing between continental populations. The fraction of signals found in the top 1% of XP-EHH scores in population i and the top 5% of population j is given in cell (i,j). Africa refers to Yoruba, Mandenka, and Bantu individuals from the HGDP-CEPH panel.

Similar articles

See all similar articles

Cited by 64 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback