Analysis of secondary phenotypes in multigroup association studies

Biometrics. 2020 Jun;76(2):606-618. doi: 10.1111/biom.13157. Epub 2019 Nov 11.


Although case-control association studies have been widely used, they are insufficient for many complex diseases, such as Alzheimer's disease and breast cancer, since these diseases may have multiple subtypes with distinct morphologies and clinical implications. Many multigroup studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI), have been undertaken by recruiting subjects based on their multiclass primary disease status, while extensive secondary outcomes have been collected. The aim of this paper is to develop a general regression framework for the analysis of secondary phenotypes collected in multigroup association studies. Our regression framework is built on a conditional model for the secondary outcome given the multigroup status and covariates and its relationship with the population regression of interest of the secondary outcome given the covariates. Then, we develop generalized estimation equations to estimate the parameters of interest. We use both simulations and a large-scale imaging genetic data analysis from the ADNI to evaluate the effect of the multigroup sampling scheme on standard genome-wide association analyses based on linear regression methods, while comparing it with our statistical methods that appropriately adjust for the multigroup sampling scheme. Data used in preparation of this article were obtained from the ADNI database.

Keywords: ascertainment; genome-wide association study; multigroup; secondary trait; selection bias.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease / diagnostic imaging
  • Alzheimer Disease / genetics
  • Apolipoproteins E / genetics
  • Biometry
  • Case-Control Studies
  • Cognitive Dysfunction / diagnostic imaging
  • Cognitive Dysfunction / genetics
  • Computer Simulation
  • Genetic Association Studies / statistics & numerical data
  • Hippocampus / diagnostic imaging
  • Humans
  • Likelihood Functions
  • Linear Models
  • Membrane Transport Proteins / genetics
  • Mitochondrial Precursor Protein Import Complex Proteins
  • Models, Statistical
  • Monte Carlo Method
  • Neuroimaging / statistics & numerical data
  • Phenotype*
  • Polymorphism, Single Nucleotide
  • Regression Analysis*


  • ApoE protein, human
  • Apolipoproteins E
  • Membrane Transport Proteins
  • Mitochondrial Precursor Protein Import Complex Proteins
  • TOMM40 protein, human