Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Next Generation Genome-Wide Association Tool: Design and Coverage of a High-Throughput European-optimized SNP Array

Thomas J Hoffmann et al. Genomics.

Abstract

The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.

Figures

Fig. 1
Fig. 1
Results of QC analysis on the Kaiser Permanente RPGEH GERA cohort using the Affymetrix Axiom system. (a) Cumulative distribution of DQC scores for 80,301 genotyped saliva samples. (b) Sample call rate versus DQC derived from 76,412 genotyped saliva samples. Only samples passing a DQC threshold of 0.82 are included. Red line indicates threshold for passing call rate. (c) Cumulative distribution of sample call rates for 76,412 genotyped samples. Only samples passing a DQC threshold of 0.82 are included. (d) Cumulative distributions for call-nocalls and miscalls based on 818 duplicate samples.
Fig. 2
Fig. 2
Genome-wide coverage for the new Axiom Genome-Wide EUR Array (solid lines) versus the Affymetrix 6.0 array (dashed lines) for a target set of Affymetrix validated CEU SNPs using Affymetrix genotypes, stratified by minor allele frequency. Coverage based on imputation with “leave-one-out cross validation.” The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range.
Fig. 3
Fig. 3
Genome-wide coverage for the new array for a target set of Affymetrix validated CEU SNPs using either Affymetrix genotypes (dashed lines) or the 1000 Genomes Low Pass (KGLP) genotypes (solid lines). Coverage based on imputation with “leave-one-out cross validation.” The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range.
Fig. 4
Fig. 4
Genome-wide coverage for the new array (solid lines) versus the Affymetrix 6.0 array (dashed lines) for a target set of 1000 Genome High Pass (KGHP) SNPs using 1000 Genomes Low Pass (KGLP) genotypes. Coverage based on imputation with “leave-one-out cross validation.” The numbers in parentheses in the legend are the numbers of markers in the target set in each particular minor allele frequency range.
Fig. 5
Fig. 5
Genome-wide coverage of the new array for two complementary target sets: KGHP SNPs with Affymetrix validated SNPs removed (solid lines), and KGLP SNPs with KGHP and Affymetrix validated SNPs removed (dashed lines). Coverage based on imputation with “leave-one-out cross validation” using KGLP genotypes. The numbers in parentheses are the numbers of markers in the two target sets in each particular minor allele frequency range.
Fig. 6
Fig. 6
Coverage of the new array (solid lines) versus the Affymetrix 6.0 array (dashed lines) for a target set of Canary CNVs using Canary CNV and Affymetrix genotype data. Coverage based on imputation with “leave-one-out cross validation.” The numbers in parentheses are the numbers of CNVs in the target set in each particular minor allele frequency range.
Fig. 7
Fig. 7
Greedy SNP selection algorithm. A set of SNPs are chosen for reasons of biological importance, significance in published GWAS, etc., or as the result of previous rounds of greedy SNP selection. The “target” set of SNPs to be covered by tagging is established to fit the purpose of the current round of SNP selection, e.g., maximize coverage of SNPs in coding regions, or maximize general coverage of the genome. Then, SNPs which are available to be placed on the microarray are assessed for their ability to increase coverage of the target set. If coverage can be increased, a set of decision rules is applied to select the best single SNP to add to the selected list, as described in the text. This process continues until maximum coverage of the target set is achieved, or no space for additional SNPs on the microarray remains.

Similar articles

See all similar articles

Cited by 84 PubMed Central articles

See all "Cited by" articles

Publication types

MeSH terms

Feedback