. 2009 Jan;30(1):69-78.
doi: 10.1002/humu.20822.

Ancestry Informative Marker Sets for Determining Continental Origin and Admixture Proportions in Common Populations in America

Ancestry Informative Marker Sets for Determining Continental Origin and Admixture Proportions in Common Populations in America

To provide a resource for assessing continental ancestry in a wide variety of genetic studies, we identified, validated, and characterized a set of 128 ancestry informative markers (AIMs). The markers were chosen for informativeness, genome-wide distribution, and genotype reproducibility on two platforms (TaqMan assays and Illumina arrays). We analyzed genotyping data from 825 subjects with diverse ancestry, including European, East Asian, Amerindian, African, South Asian, Mexican, and Puerto Rican. A comprehensive set of 128 AIMs and subsets as small as 24 AIMs are shown to be useful tools for ascertaining the origin of subjects from particular continents, and to correct for population stratification in admixed population sample sets. Our findings provide general guidelines for the application of specific AIM subsets as a resource for wide application. We conclude that investigators can use TaqMan assays for the selected AIMs as a simple and cost efficient tool to control for differences in continental ancestry when conducting association studies in ethnically diverse populations.


Figure 1
Figure 1. Analysis of population genetic structure using In4 AIMs
Each vertical line represents an individual subject. Along the abscissa each self identified population group is shown. The population groups include European American (EURA, 188 subjects), West African (AFR, 98 subjects), Amerindian (AMI, 88 subjects), East Asian (105 subjects), South Asian (SAS, 64 subjects) African American (88 subjects), Puerto Rican American (PRA, 28 subjects), Mexican American (MAM, 40 subjects) and Mexican (MXN, 26 subjects). Analyses were performed without any prior population assignment. Analyses for the128 In4 marker set are shown for 4 population groups (K=4) in (A), and K=5 in (B). Analyses for 64 In4 for K=5 in (C) and K=3 (without East or South Asian samples) in (D).
Figure 2
Figure 2. Correlation between the estimations of genetic contribution using different AIM sets and 128 In4 AIMs
The abscissa shows the 128 In4 result and the ordinate the result using the color coded AIM set. The individual for African contribution in African Americans [(A) and (B)], European contribution in Puerto Ricans [(C) and (D)], and Amerindian in Mexicans and Mexican Americans [(E) and (F)] are shown based on STRUCTURE analyses.
Figure 3
Figure 3. Correction of population stratification in association tests using different AIM sets
Three population specific alleles were used to model phenotypes prevalent in a particular population. The ordinate shows the χ2 value with the first value showing the Armitage test result. The correction for false positive association tests (EIGENSTRAT analyses) using either 200K SNP markers, or the selected AIM sets are shown along the abscissa. The surrogate cases are defined by homozygosity for: (A) and (D) allele A for rs 2675348 in SLC24A5 locus; (B) allele A for rs1446585 in LCT locus; (C) allele A for rs100008281 in ADH1B locus. The surrogate cases are chosen in 865 samples from EURA, AFR, and EAS populations in (A), (B) and (C); and from 1847 African American samples in D). The dashed bold line represent nominal significance level (p=0.05) corrected for 200K independent tests: χ2 = 26.6 (p=2.5e-7). The marker shade/color indicates the location of relative to the locus chosen to define the surrogate phenotype. The dark markers are located on chromosomes that do not contain the locus defining the surrogate phenotype while the lighter markers are located near the locus.

