Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8 (1), e1000294

Rare Variants Create Synthetic Genome-Wide Associations


Rare Variants Create Synthetic Genome-Wide Associations

Samuel P Dickson et al. PLoS Biol.


Genome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (, hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.

Conflict of interest statement

The authors have declared that no competing interests exist.


Figure 1
Figure 1. Example genealogies showing causal variants and the strongest association for a common variant.
(A) A genealogy with 10,000 original haplotypes was generated with 3,000 cases and 3,000 controls, genotype relative risk (γ) = 4, and nine causal variants. The branches containing the strongest synthetic association are indicated in blue. The branches containing the rare causal variants are in red. (B) A second genealogy was generated using the same parameters. These genealogies demonstrate two scenarios with genome-wide significant synthetic associations: the first (upper genealogy) had a high risk allele frequency (RAF = 0.49), and the second (lower genealogy) had a low RAF (0.08).
Figure 2
Figure 2. The proportion of simulations with a variant of genome-wide significance.
Results for rare variants are shown in red; for the top hit among common variants, results are shown in black; and in blue are the results for the next best hit for common variants after including the top hit in the regression model. At the bottom of each graph, the simulation parameters are represented graphically. Results across all parameters with no recombination are shown in (A) with the shaded region representing the effect size at which linkage analysis is expected to begin generating consistent signals (GRR = 4). Results for simulations that included recombination are shown in (B). The shaded region in (B) is the same as the shaded region in (A), with the rate of recombination for the same parameters increasing along the x-axis.
Figure 3
Figure 3. The proportion of simulations with a variant of genome-wide significance separated by disease class.
Increasing the number of causal variants generally increases the probability of creating synthetic associations by increasing the size of the disease class without increasing the allele frequency of causal variants. Within disease class, increasing the number of causal variants decreases the probability of creating synthetic association.
Figure 4
Figure 4. Mean and variance of r 2 between rare and common sites as a function of rate of recombination.
A total of 100,000 simulations of two loci with multiple variants in each loci show how the mean and variance of estimates of r 2 between rare and common variants are affected by recombination. Although the mean is a nonincreasing function of recombination, the variance increases then decreases, which shows why the maximum r 2 between rare and common variants can increase with low amounts of recombination in a region.
Figure 5
Figure 5. Allele frequency distributions of all HapMap SNPs (black), Illumina 1M SNPs (blue), and GWAS associations in CEU (red), and simulated synthetic associations (green).
The allele frequencies show both minor and major allele frequencies. GWAS associations have a clear tendency towards the center, representing greater power to detect association with variants with higher minor allele frequencies. CEU = population of western European ancestry.
Figure 6
Figure 6. Simulated Manhattan plots in a 10-Mb region.
(A) This region has nine rare causal variants selected at random with GRR = 4 and 3,000 cases and 3,000 controls. (B) The same region with permuted phenotypes shows what the region would look like without any association.
Figure 7
Figure 7. The 2.5-Mb genomic region on chr11p15.4 containing 179 genome-wide significant synthetic associations with sickle cell anemia in African Americans.
The −log10(p) values for all genome-wide significant SNPs were displayed in the upper track, whereas the LD patterns based on HapMap YRI (Yoruba people of Ibadan, Nigeria) population is displayed in the lower track. The region contains dozens of genes spanning several discernible LD blocks.
Figure 8
Figure 8. Overview of the GJB2/GJB6 locus on 13q12.11 in the hearing loss GWAS.
The three most significantly associated SNPs have weak LD between each other. Although the most common causal variants (35delG) within GJB2 has a frequency of only 1.25% in European Americans, the locus can still be identified by GWAS with common tagging SNPs.

Comment in

Similar articles

See all similar articles

Cited by 474 PubMed Central articles

See all "Cited by" articles


    1. McCarthy M. I, Hirschhorn J. N. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008;17:R156–165. - PMC - PubMed
    1. Lowe C. E, Cooper J. D, Brusko T, Walker N. M, Smyth D. J, et al. Large-scale genetic fine mapping and genotype-phenotype associations implicate polymorphism in the IL2RA region in type 1 diabetes. Nat Genet. 2007;39:1074–1082. - PubMed
    1. Burfoot R. K, Jensen C. J, Field J, Stankovich J, Varney M. D, et al. SNP mapping and candidate gene sequencing in the class I region of the HLA complex: searching for multiple sclerosis susceptibility genes in Tasmanians. Tissue Antigens. 2008;71:42–50. - PubMed
    1. Hafler J. P, Maier L. M, Cooper J. D, Plagnol V, Hinks A, et al. CD226 Gly307Ser association with multiple autoimmune diseases. Genes Immun. 2009;10:5–10. - PMC - PubMed
    1. Deloukas P on behalf of the Wellcome Trust Case Control Consortium. High throughput approaches to fine mapping in regions of confirmed association. 2008. Presentation at the 58th Annual Meeting of the American Society of Human Genetics, November 13, 2008; Philadelphia, Pennsylvania.

Publication types

LinkOut - more resources