A pooling strategy to effectively use genotype data in quantitative traits genome-wide association studies

Stat Med. 2018 Nov 30;37(27):4083-4095. doi: 10.1002/sim.7898. Epub 2018 Jul 12.

Abstract

The goal of quantitative traits genome-wide association studies is to identify associations between a phenotypic variable, such as a vitamin level and genetic variants, often single-nucleotide polymorphisms. When funding limits the number of assays that can be performed to measure the level of the phenotypic variable, a subgroup of subjects is often randomly selected from the genotype database and the level of the phenotypic variable is then measured for each subject. Because only a proportion of the genotype data can be used, such a simple random sampling method may suffer from substantial loss of efficiency, especially when the number of assays is relative small and the frequency of the less common variant (minor allele frequency) is low. We propose a pooling strategy in which subjects in a randomly selected reference subgroup are aligned with randomly selected subjects from the remaining study subjects to form independent pools; blood samples from subjects in each pool are mixed; and the level of the phenotypic variable is measured for each pool. We demonstrate that the proposed pooling approach produces considerable gains in efficiency over the simple random sampling method for inference concerning the phenotype-genotype association, resulting in higher precision and power. The methods are illustrated using genotypic and phenotypic data from the Trinity Students Study, a quantitative genome-wide association study.

Keywords: biallelic model; efficiency; estimation; group testing; homozygous minor allele; multiple comparison; phenotypes; pooling; power; random sampling; single nucleotide polymorphism.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Interpretation, Statistical*
  • Genome-Wide Association Study / methods*
  • Genotype*
  • Humans
  • Models, Statistical
  • Polymorphism, Single Nucleotide
  • Quantitative Trait, Heritable*
  • Statistics as Topic