Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2012 Jan 18;13(2):135-45.
doi: 10.1038/nrg3118.

Rare and Common Variants: Twenty Arguments

Affiliations
Free PMC article
Review

Rare and Common Variants: Twenty Arguments

Greg Gibson. Nat Rev Genet. .
Free PMC article

Abstract

Genome-wide association studies have greatly improved our understanding of the genetic basis of disease risk. The fact that they tend not to identify more than a fraction of the specific causal loci has led to divergence of opinion over whether most of the variance is hidden as numerous rare variants of large effect or as common variants of very small effect. Here I review 20 arguments for and against each of these models of the genetic basis of complex traits and conclude that both classes of effect can be readily reconciled.

Conflict of interest statement

Competing Interests Statement

The author declares no competing financial interests.

Figures

Figure 1
Figure 1. Different expected signatures from GWAS for four models of disease
Each plot shows the approximate expected distribution of SNP effects for a modest study of 2,000 cases and controls. The Y-axis is the percent of the variance for a trait or disease liability in the population explained by each SNP (note that standard Manhattan plots typically show the significance instead, represented as the negative log10 of the p-value) and the X-axis is the location of tens of thousands of SNPs along the chromosome. In the common disease-common variant (CD-CV) model, a small number of moderate effect loci would produce very strong signals, each of which explains several percent of the genetic variance. Note the expanded scale of the y-axis here relative to the other plots. In the rare allele model, causal variant effects (yellow dots) may be large in a few individuals but are not common enough to explain much variance or result in genome-wide significance. The infinitesimal model by contrast does produce some significant peaks due to small effects of common variants, and in each case several SNPs within an LD-block associate with the trait. Finally, it can be argued that if associations are only seen in some environments (green and orange signals, bottom right), then in a mixed population the overall effect will be reduced at such loci (arrowheads), and fewer associations will be detected, explaining less of the variance.
Figure 2
Figure 2. Expected distribution of Risk Variants
(a) The approximate frequency distribution of risk alleles in cases (blue) and controls (red) under the infinitesimal model for a disease with high heritability and 10% prevalence. For this particular parameterization I assumed 200 loci with risk allele frequencies from 0.1 to 0.9, but skewed toward lower frequencies. Each risk allele is assumed to increase the probability of disease additively by 1.04 relative to the overall risk of 10%. The frequency distribution in cases is skewed to the right, but note that the median number of risk alleles in affected individuals is only slightly greater than in unaffecteds. (b) An approximate frequency distribution of risk allele frequencies under a multiplicative rare allele model for a disease with high heritability and 1% prevalence. This parameterization assumes 100 loci, each with a risk allele frequency of 1%, such that each risk allele increases the probability of disease 2.5 fold over a background risk of 0.2%. The vast majority of unaffecteds carry at least one allele (the yellow bar show the expected number of individuals without any risk alleles). The inset shows the same figure on the log scale, emphasizing how relative risk increases with the number of variants carried. Note that the measured per allele GRR across the population in the presence of 100 other alleles is ~1.15, much smaller than the 2.5 fold multiplicative risk due to a single variant. For higher risks (say 5-fold) and 100 alleles, the frequencies must be very rare (~0.1%) for a disease prevalence of 1%, and affecteds will only carry one or two risk alleles.
Figure 3
Figure 3. Inconsistency between GWAS results and rare variant expectations
(A) The frequency distribution of risk allele frequencies (light red) for 414 common variant associations with 17 diseases is only slightly skewed toward lower frequency variants. By contrast, simulations, in this case assuming up to 9 rare causal variants inducing the common variant association with SNPs at the same frequency as observed on common genotyping platforms (light green bars) result in a marked left-skew with a peak for common variants whose frequency is less than 10%. (The skew is even stronger if only a single causal variant is responsible). The observed data is thus not immediately consistent with the rare variant model. (B) Part of the problem with synthetic associations is that they would explain too much heritability if they were pervasively responsible for common variant effects. This is due to the relationship between allele frequency, maximum possible LD, and the amount of variance explained [19]. The plot shows the expected odds ratio due to a rare variant of the indicated frequency (from 0.5% to 2%) if it increases the odds ratio at a common SNP (with which it is in maximum possible LD) 1.1-fold. Intermediate effect sizes (2
Figure 4
Figure 4. Joint effects of rare and common variants
A straight forward reconciliation of the effects of rare and common variants supposes that pervasive common variation influences the expression and activity of genes in pathways, establishing the background liability to disease that is then further modified by rare variants with larger effects. In this hypothetical example of central metabolism, standing variation results in some individuals having lower flux than others (left versus right; colored boxes imply enzyme activity differences from low, red, to high, green), but according to standard biochemical theory, systems evolve such that most variation is accommodated within the healthy range. The impact of a rare variant that knocks out one copy of the enzyme indicated with the cross is conditional on this liability, pushing the individual on the left beyond the disease threshold, while the individual on the right can accommodate the mutation given higher activity elsewhere in glycolysis.

Similar articles

See all similar articles

Cited by 407 articles

See all "Cited by" articles

Publication types

Feedback