Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2007 Jun 7;447(7145):661-78.
doi: 10.1038/nature05911.

Genome-wide Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls

Collaborators
Free PMC article
Multicenter Study

Genome-wide Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls

Wellcome Trust Case Control Consortium. Nature. .
Free PMC article

Abstract

There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined approximately 2,000 individuals for each of 7 major diseases and a shared set of approximately 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 x 10(-7): 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10(-5) and 5 x 10(-7)) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.

Figures

Figure 1
Figure 1. Genome-wide scan for allele frequency differences between controls
a, P values from the trend test for differences between SNP allele frequencies in the two control groups, stratified by geographical region. SNPs have been excluded on the basis of failure in a test for Hardy-Weinberg equilibrium in either control group considered separately, a low call rate, or if minor allele frequency is less than 1%, but not on the basis of a difference between control groups. Green dots indicate SNPs with a P value <1×10-5. b, Quantile-quantile plots of these test statistics. In this and subsequent quantile-quantile plots, the shaded region is the 95% concentration band (see Methods).
Figure 2
Figure 2. Genome-wide picture of geographic variation
a, P values for the 11-d.f. test for difference in SNP allele frequencies between geographical regions, within the 9 collections. SNPs have been excluded using the project quality control filters described in Methods. Green dots indicate SNPs with a P value <1×10-5. b, Quantile-quantile plots of these test statistics. SNPs at which the test statistic exceeds 100 are represented by triangles at the top of the plot, and the shaded region is the 95% concentration band (see Method). Also shown in blue is the quantile-quantile plot resulting from removal of all SNPs in the 13 most differentiated regions (Table 1).
Figure 3
Figure 3. Quantile-quantile plots for seven genome-wide scans
For each of the seven disease collections, a quantile-quantile plot of the results of the trend test is shown in black for all SNPs that pass the standard project filters, have a minor allele frequency >1% and missing data rate <1%. SNPs that were visually inspected and revealed genotype calling problems were excluded. These filters were chosen to minimize the influence of genotype-calling artefacts. Each quantile-quantile plot shown in black involves around 360,000 SNPs. SNPs at which the test statistic exceeds 30 are represented by triangles. Additional quantile-quantile plots, which also exclude all SNPs located in the regions of association listed in Table 3, are superimposed in blue (for BD, the exclusion of these SNPs has no visible effect on the plot, and for HT there are no such SNPs). The blue quantile-quantile plots show that departures in the extreme tail of the distribution of test statistics are due to regions with a strong signal for association.
Figure 4
Figure 4. Genome-wide scan for seven diseases
For each of seven diseases -log10 of the trend test P value for quality-control-positive SNPs, excluding those in each disease that were excluded for having poor clustering after visual inspection, are plotted against position on each chromosome. Chromosomes are shown in alternating colours for clarity, with P values <1×10-5 highlighted in green. All panels are truncated at -log10(P value)=15, although some markers (for example, in the MHC in T1D and RA) exceed this significance threshold.
Figure 5
Figure 5. Regions of the genome showing strong evidence of association
Characteristics of genomic regions 1.25 Mb to either side of ‘hit SNPs’—SNPs with lowest P values. Region boundaries (vertical dotted lines) were chosen to coincide with locations where test statistics returned to background levels and, where possible, recombination hotspots. Upper panel, -log10(P values) for the test (trend or genotypic) with the smallest P value at the hit SNP. Black points represent SNPs tyred in the study, and grey points represent SNPs whose genotypes were imputed. SNPs imputed with higher confidence are shown in darker grey. Middle panel, fine-scale recombination rate (centimorgans per Mb) estimated from Phase II HapMap. The purple line shows the cumulative genetic distance (in cM) from the hit SNP. Lower panel, known genes, and sequence conservation in 17 vertebrates. Known genes (orange) in the hit region are listed in the upper right part of each plot in chromosomal order, starting at the left edge of the region. The top track shows plus-strand genes and the middle track shows minus-strand genes. Sequence conservation (bottom track) scores are based on the phylogenetic hidden Markov model phastCons. Highly conserved regions (phastCons score≥600) are shown in blue. Information in middle and lower panels is taken from the USCS Genome Browser. Positions are in NCBI build-35 coordinates. See Supplementary Information on ‘signal plots.’
Figure 5
Figure 5. Regions of the genome showing strong evidence of association
Characteristics of genomic regions 1.25 Mb to either side of ‘hit SNPs’—SNPs with lowest P values. Region boundaries (vertical dotted lines) were chosen to coincide with locations where test statistics returned to background levels and, where possible, recombination hotspots. Upper panel, -log10(P values) for the test (trend or genotypic) with the smallest P value at the hit SNP. Black points represent SNPs tyred in the study, and grey points represent SNPs whose genotypes were imputed. SNPs imputed with higher confidence are shown in darker grey. Middle panel, fine-scale recombination rate (centimorgans per Mb) estimated from Phase II HapMap. The purple line shows the cumulative genetic distance (in cM) from the hit SNP. Lower panel, known genes, and sequence conservation in 17 vertebrates. Known genes (orange) in the hit region are listed in the upper right part of each plot in chromosomal order, starting at the left edge of the region. The top track shows plus-strand genes and the middle track shows minus-strand genes. Sequence conservation (bottom track) scores are based on the phylogenetic hidden Markov model phastCons. Highly conserved regions (phastCons score≥600) are shown in blue. Information in middle and lower panels is taken from the USCS Genome Browser. Positions are in NCBI build-35 coordinates. See Supplementary Information on ‘signal plots.’
Figure 6
Figure 6. Strong associations in subsamples of our data
For the 16 SNPs in Table 3 (outside the MHC) with P values for the trend test below 5×10-7 we randomly generated 1,000 subsets of our full data set corresponding to case-control studies with different numbers of cases, and the same number of controls (x axis). The y axis gives the proportion of subsamples of a given size in which that SNP achieved a P value for the trend test below 5×10-7. SNPs are numbered according to the row in which they occur in Table 3 (so that, for example, the CAD hit is numbered 2, and the TCF7L2 hit on chromosome 10 for T2D is numbered 20).

Comment in

Similar articles

See all similar articles

Cited by 3,938 articles

See all "Cited by" articles

Publication types

Substances

Associated data

Feedback