Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies

Genet Epidemiol. 2011 Dec;35(8):790-9. doi: 10.1002/gepi.20628. Epub 2011 Sep 15.


Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cost-Benefit Analysis
  • Gene Frequency
  • Genetic Variation*
  • Genome-Wide Association Study* / economics
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait, Heritable