Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep;42(6):500-515.
doi: 10.1002/gepi.22133. Epub 2018 Jun 3.

Analysis of Pedigree Data in Populations With Multiple Ancestries: Strategies for Dealing With Admixture in Caribbean Hispanic Families From the ADSP

Affiliations
Free PMC article

Analysis of Pedigree Data in Populations With Multiple Ancestries: Strategies for Dealing With Admixture in Caribbean Hispanic Families From the ADSP

Rafael A Nafikov et al. Genet Epidemiol. .
Free PMC article

Abstract

Multipoint linkage analysis is an important approach for localizing disease-associated loci in pedigrees. Linkage analysis, however, is sensitive to misspecification of marker allele frequencies. Pedigrees from recently admixed populations are particularly susceptible to this problem because of the challenge of accurately accounting for population structure. Therefore, increasing emphasis on use of multiethnic samples in genetic studies requires reevaluation of best practices, given data currently available. Typical strategies have been to compute allele frequencies from the sample, or to use marker allele frequencies determined by admixture proportions averaged over the entire sample. However, admixture proportions vary among pedigrees and throughout the genome in a family-specific manner. Here, we evaluate several approaches to model admixture in linkage analysis, providing different levels of detail about ancestral origin. To perform our evaluations, for specification of marker allele frequencies, we used data on 67 Caribbean Hispanic admixed families from the Alzheimer's Disease Sequencing Project. Our results show that choice of admixture model has an effect on the linkage analysis results. Variant-specific admixture proportions, computed for individual families, provide the most detailed regional admixture estimates, and, as such, are the most appropriate allele frequencies for linkage analysis. This likely decreases the number of false-positive results, and is straightforward to implement.

Keywords: Markov Chain Monte Carlo; complex trait; large pedigrees; late-onset disease; missing data.

Figures

Figure 1
Figure 1
Local ancestry estimation and PCA. Solid and dashed line boxes represent data sets and analytical procedures used on these data sets, respectively, with heavy lines indicating data sets used in additional analyses reported here. ADSP, Alzheimer’s Disease Sequencing Project; CH, Caribbean Hispanics; GENESIS, GENetic EStimation and Inference in Structured samples package; GWAS, Genome-Wide Association Study; KING-robust, Kinship-based INference for Gwas robust method; PCA, principal component (PC) analysis.
Figure 2
Figure 2
PCA of the ADSP admixed CH population. Dots and pluses represent “unrelated” and “related” subsets of individuals, respectively. PCA, principal component (PC) analysis.
Figure 3
Figure 3
Ancestry modeling and allele frequency specification in the ADSP CH families. (A) Family-based genome-wide (FBGW) admixture proportions of European, African, and Native American ancestries for each of 67 families. Two horizontal black lines drawn across the bar chart serve as reference points for the values of admixture proportions of the above average ancestries in the ADSP CH sample shown in the first bar as the Global admixture proportions for the ancestries in the entire ADSP CH sample. (B–D) are scatter plots of genome-wide framework markers’ alternative allele frequencies (AF) which were computed in CU0048F using one of our four admixture models and the 1000 Genomes Project populations’ data. Dashed lines define borders of a region where alternative AF differences are within ± 0.1 limit. FBCW, family-based chromosome-wide; ADSP and CH are defined in Figure 1 legend.
Figure 4
Figure 4
Possible inflation of logarithm of odds (LOD) scores in multipoint linkage analysis when generalized admixture models are used to compute framework marker allele frequencies. (A) Upper part of the plot shows LOD scores for multipoint linkage analysis with incomplete penetrance model on chr 18 in CU0048F. Global, Family-Based Genome-Wide (FBGW), Family-Based Chromosome-Wide (FBCW), and Local models of ancestry specification were used to compute framework marker allele frequencies used in the linkage analysis. Lower parts of the plot show admixture proportions for European (EUR), African (AFR), and Native American (AMR) ancestries computed using four different admixture models mentioned above. (B), (C), and (D) are scatter plots of LOD scores for which allele frequencies were computed with one of the four different admixture models mentioned above. ADSP and CH are defined in Figure 1 legend.
Figure 5
Figure 5
Possible inflation and deflation of logarithm of odds (LOD) scores and linkage region misspecification in multipoint linkage analysis when generalized admixture models used to compute framework marker allele frequencies. Upper part of the plots shows LOD scores for multipoint linkage analysis with incomplete penetrance model on chr 7 (A), chr 22 (B), chr 17 (C), and chr 13 (D) in CU0030F, CU0005F, CU0042F, and CU0018F, respectively. Global, Family-Based Genome-Wide (FBGW), Family-Based Chromosome-Wide (FBCW), and Local models of ancestry specification were used to compute framework marker allele frequencies used in the linkage analysis. Lower parts of the plots show admixture proportions for European (EUR), African (AFR), and Native American (AMR) ancestries computed in the four different families using four different admixture models mentioned above. ADSP and CH are defined in Figure 1 legend.
Figure 6
Figure 6
Insensitivity of multipoint linkage analysis to allele frequency misspecification in pedigree CU0040F. (A) Upper part of the plot shows logarithm of odds (LOD) scores for multipoint linkage analysis with incomplete penetrance model on chr 3. Global, Family-Based Genome-Wide (FBGW), Family-Based Chromosome-Wide (FBCW), and Local models of ancestry specification were used to compute framework marker allele frequencies used in the linkage analysis. Lower parts of the plot show admixture proportions for European (EUR), African (AFR), and Native American (AMR) ancestries computed using four different admixture models mentioned above. (B), (C), and (D) are scatter plots of LOD scores for which allele frequencies were computed with one of the four different admixture models mentioned above. ADSP and CH are defined in Figure 1 legend.

Similar articles

See all similar articles

Publication types

Feedback