Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec;41(8):744-755.
doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels

Affiliations

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels

Wei Zhou et al. Genet Epidemiol. 2017 Dec.

Abstract

The accuracy of genotype imputation depends upon two factors: the sample size of the reference panel and the genetic similarity between the reference panel and the target samples. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association studies. We compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels-1000 Genomes phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the population-based Nord Trøndelag Health Study (HUNT) from low-pass genome sequencing. We observed that the population-matched reference panel allowed for imputation of more population-specific variants with lower frequency (minor allele frequency (MAF) between 0.05% and 0.5%). The overall imputation accuracy from the population-specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15-fold larger. These results recapitulate the value of population-specific reference panels for genotype imputation. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We observed that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of including only one version of each genetic variant, selected for having the highest imputation quality metric. This was particularly true for lower frequency variants (MAF < 1%), even after adjusting for the additional multiple testing burden.

Keywords: GWAS; genotype imputation; multiple reference panels; population-specific; study power.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest

The authors have no conflict of interest to declare.

Figures

Figure 1
Figure 1. Number of variants that are imputed by different reference panels
The corresponding percentage is the variants number out of all 23.8 million variants that are successfully imputed by any of the three reference panels.
Figure 2
Figure 2. Distribution of numbers of variants that are imputed from only one reference panel or from multiple reference panels in different MAF categories
Variants that are imputed by 1000G only are categorized as SNPs and non-SNP variants, including indels, deletions, complex short substitutions and other structural variant classes. 1000G, 1000 Genomes Phase 3; WGS, whole-genome sequencing; HRC, Haplotype Reference Consortium; MAF, minor allele frequency
Figure 3
Figure 3. HRC and HUNT WGS panels show comparable imputation quality
a. comparing the mean empirical R2 (y axis) reported by different reference panels for variants that are directly genotyped categorized by the MAF (x axis) without any ImpRsq threshold applied. b. comparing the mean Imputation R2 (y axis) reported by different reference panels for variants that are directly genotyped categorized by the MAF (x axis) without any ImpRsq threshold applied. c. comparing the mean Imputation R2 (y axis) reported by different reference panels for all imputed variants (ImpRsq > 0.3) by the MAF (x axis). d. comparing the mean Imputation R2 (y axis) reported by different reference panels for all imputed variants by the MAF (x axis) without any ImpRsq threshold applied. 1000G, 1000 Genomes Phase 3; WGS, whole-genome sequencing; HRC, Haplotype Reference Consortium; MAF, minor allele frequency; ImpRsq, imputation quality metric R2
Figure 4
Figure 4. Comparison of power to detect true associations between best p-value and best Rsq approaches via simulation studies
For each MAF category, 3,000 directly genotyped variants were randomly selected based on their MAF estimated with genotypes obtained from the chip array to estimate the power. The power is calculated as the proportion of significantly associated variants across three imputed panels based on each strategy given the corresponding significance threshold. ImpRsq > 0.3 was applied to remove poorly imputed genotypes. The numbers of variants that were successfully imputed from at least two reference panels and used in the simulation studies are: 2,513 with MAF > 0 and ≤ 0.001; 2,989 with MAF > 0.001 and ≤ 0.01; 3,000 with MAF > 0.01 and ≤ 0.05; and 3,000 with MAF > 0.05. MAF, minor allele frequency; ImpRsq, imputation quality metric R2

Similar articles

Cited by

References

    1. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Abecasis GR. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84(2):210–223. doi: 10.1016/j.ajhg.2009.01.005. - DOI - PMC - PubMed
    1. Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013;194(2):459–471. doi: 10.1534/genetics.113.150029. - DOI - PMC - PubMed
    1. Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Samani NJ. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. - PMC - PubMed
    1. Cheng TH, Thompson DJ. Five endometrial cancer risk loci identified through genome-wide association analysis. 2016;48(6):667–674. doi: 10.1038/ng.3562. - DOI - PMC - PubMed

LinkOut - more resources