Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;38(7):579-90.
doi: 10.1002/gepi.21844. Epub 2014 Aug 1.

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees

Affiliations
Free PMC article

Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees

Mohamad Saad et al. Genet Epidemiol. .
Free PMC article

Abstract

In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.

Keywords: MCMC; association analysis; burden test; inheritance vectors; kernel statistic; mixed linear model; sequence data; variance components.

Figures

Figure 1
Figure 1
Joint probabilities of possible genotypes (AA, Aa, aa) and their variances.
Figure 2
Figure 2
Correlation between allelic dosages obtained by GIGI and the true genotypes (x-axis) versus correlation between allelic dosages obtained by BEAGLE and the true genotypes (y-axis), for different bins of MAFs: A) LowLD pattern, B) HighLD pattern.
Figure 3
Figure 3
Correlation between allelic dosages obtained by GIGI+BEAGLE and the true genotypes (x-axes) versus correlation between allelic dosages obtained by: BEAGLE (first row figures), GIGI (second row figures), and the MAX between the correlations obtained by GIGI and BEAGLE (third row figures) with the true genotypes (y-axes). A) LowLD pattern, B) HighLD pattern. Left part of every LD pattern column figures: MAF>0.01; Right part of every LD pattern column figures: MAF<=0.01.
Figure 4
Figure 4
Power of famSKAT, famSKAT-B, famSKAT-RC, and famCMWS in the sequence data, under the LowLD pattern, for the different settings of number of associated and non-associated SNPs and the proportion of common SNPs among them; A) For a model with associated SNPs only: A=10, fc=0.3; A=10, fc=0.5; A=20, fc=0.3; and A=20, fc=0.5; B) For a model with associated and non-associated SNPs: A=10, U=20, fc=0.3; A=10, U=20, fc=0.5; A=20, U=40, fc=0.3; and A=20, U=40, fc=0.5, where fc is the proportion of common SNPs.
Figure 5
Figure 5
Power of famCMWS for the different imputation and the sequence data, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them: A=10, fc=0.3; A=10, fc=0.5; A=20, fc=0.3; and A=20, fc=0.5, where fc is the proportion of common associated SNPs. A) LowLD pattern; B) HighLD pattern.
Figure 6
Figure 6
Power of famCMWS for the different combined imputation data (GIGI+BEAGLE, G+B+T, and G_S+B), under the LowLD pattern, for a model with associated SNPs only, for the different settings of number of associated SNPs and the proportion of common SNPs among them: A=10, fc=0.3; A=10, fc=0.5; A=20, fc=0.3; and A=20, fc=0.5, where fc is the proportion of common associated SNPs.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback