Reexamining Dis/Similarity-Based Tests for Rare-Variant Association with Case-Control Samples

Genetics. 2018 May;209(1):105-113. doi: 10.1534/genetics.118.300769. Epub 2018 Mar 15.


A properly designed distance-based measure can capture informative genetic differences among individuals with different phenotypes and can be used to detect variants responsible for the phenotypes. To detect associated variants, various tests have been designed to contrast genetic dissimilarity or similarity scores of certain subject groups in different ways, among which the most widely used strategy is to quantify the difference between the within-group genetic dissimilarity/similarity (i.e., case-case and control-control similarities) and the between-group dissimilarity/similarity (i.e., case-control similarities). While it has been noted that for common variants, the within-group and the between-group measures should all be included; in this work, we show that for rare variants, comparison based on the two within-group measures can more effectively quantify the genetic difference between cases and controls. The between-group measure tends to overlap with one of the two within-group measures for rare variants, although such overlap is not present for common variants. Consequently, a dissimilarity or similarity test that includes the between-group information tends to attenuate the association signals and leads to power loss. Based on these findings, we propose a dissimilarity test that compares the degree of SNP dissimilarity within cases to that within controls to better characterize the difference between two disease phenotypes. We provide the statistical properties, asymptotic distribution, and computation details for a small sample size of the proposed test. We use simulated and real sequence data to assess the performance of the proposed test, comparing it with other rare-variant methods including those similarity-based tests that use both within-group and between-group information. As similarity-based approaches serve as one of the dominating approaches in rare-variant analysis, our results provide some insight for the effective detection of rare variants.

Keywords: U-statistics; cross-sample comparison; within-sample comparison.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Case-Control Studies
  • Computer Simulation
  • Genetic Association Studies* / methods
  • Genetic Variation*
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Phenotype
  • Sample Size