Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework

Mathematics (Basel). 2023 Jun 1;11(11):2560. doi: 10.3390/math11112560. Epub 2023 Jun 2.


Association testing has been widely used to study the relationship between genetic variants and phenotypes. Most association testing methods are genotype-based, i.e. first estimate genotype and then regress phenotype on estimated genotype and other variables. Directly testing methods based on next generation sequencing (NGS) data without genotype calling have been proposed and shown advantage over genotype-based methods in the scenarios when genotype calling is not accurate. NGS data-based single-variant testing have been proposed including our previously proposed single-variant testing method, i.e. UNC combo method [1]. NGS data-based group testing methods for continuous phenotype have also been proposed by us using a linear model framework which can handle continuous responses [2]. In this paper, we extend our linear model-based framework to a generalized linear model-based framework so that the methods can handle other types of responses especially binary responses which is commonly-faced in association studies. We have conducted extensive simulation studies to evaluate the performance of different estimators and compare our estimators with their corresponding genotype-based methods. We found that all methods have Type I errors controlled, and our NGS data-based testing methods have better performance than their corresponding genotype-based methods in the literature for other types of responses including binary responses (logistic regression) and count responses (Poisson regression especially when sequencing depth is low. In conclusion, we have extended our previous linear model (LM) framework to a generalized linear model (GLM) framework and derived NGS data-based testing methods for a group of genetic variants. Compared with our previously proposed LM-based methods [2], the new GLM-based methods can handle more complex responses (for example, binary responses and count responses) in addition to continuous responses. Our methods have filled the literature gap and shown advantage over their corresponding genotype-based methods in the literature.

Keywords: 62J05; 62P10; 92B15; Association study; Generalized linear model; Genotype calling; Group testing; Joint significance test; Next-generation sequencing; Rare variant; Score test; Variable collapse test.