Family based multi-locus tests integrate information from individual loci by weighted averaging of the marginal statistics, and have been proven to be more efficient and robust than the single-locus tests in genetic association studies. The power depends on how much information the weights can extract from data. The currently published weighted sum methods are only applicable to either common or rare variants and may suffer from substantial power loss especially for rare variants. In this paper, we propose a novel data-driven weight to improve the power under both common and rare variant circumstances. We use the l1 regularization in Least Absolute Shrinkage and Selection Operator (LASSO) regression to construct the weight serving as a simultaneously adaptive marker selection process. Simulations for a dichotomous phenotype demonstrated that our LASSO-based approach outperformed the existing multi-locus methods in the sense of providing the highest statistical power while well controlled type I error rate under different scenarios. We also applied our methods to a real dataset for rheumatoid arthritis (GAW15 Problem 2). Two groups of alleles, in which individual SNPs had only modest and non-significant effects, were detected (P < 0.00001) using our proposed methods, whereas traditional multi-locus methods failed to identify them. In conclusion, the novel LASSO-based approach represents a superior weight-choosing strategy for multi-locus tests.
Keywords: Family based design; Genetic association studies; LASSO; Multi-locus; Robust-efficient.
Copyright © 2020 Elsevier Ltd. All rights reserved.