Although case-control association studies have been widely used, they are insufficient for many complex diseases, such as Alzheimer's disease and breast cancer, since these diseases may have multiple subtypes with distinct morphologies and clinical implications. Many multigroup studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI), have been undertaken by recruiting subjects based on their multiclass primary disease status, while extensive secondary outcomes have been collected. The aim of this paper is to develop a general regression framework for the analysis of secondary phenotypes collected in multigroup association studies. Our regression framework is built on a conditional model for the secondary outcome given the multigroup status and covariates and its relationship with the population regression of interest of the secondary outcome given the covariates. Then, we develop generalized estimation equations to estimate the parameters of interest. We use both simulations and a large-scale imaging genetic data analysis from the ADNI to evaluate the effect of the multigroup sampling scheme on standard genome-wide association analyses based on linear regression methods, while comparing it with our statistical methods that appropriately adjust for the multigroup sampling scheme. Data used in preparation of this article were obtained from the ADNI database.
Keywords: ascertainment; genome-wide association study; multigroup; secondary trait; selection bias.
© 2019 The International Biometric Society.