A generalized linear mixed model association tool for biobank-scale data

Longda Jiang; Zhili Zheng; Hailing Fang; Jian Yang

doi:10.1038/s41588-021-00954-4

A generalized linear mixed model association tool for biobank-scale data

Nat Genet. 2021 Nov;53(11):1616-1621. doi: 10.1038/s41588-021-00954-4. Epub 2021 Nov 4.

Authors

Longda Jiang^#^{1

2}, Zhili Zheng^#¹, Hailing Fang^{2

3}, Jian Yang^{4

5

6}

Affiliations

¹ Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
² School of Life Sciences, Westlake University, Hangzhou, China.
³ Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China.
⁴ Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia. jian.yang@westlake.edu.cn.
⁵ School of Life Sciences, Westlake University, Hangzhou, China. jian.yang@westlake.edu.cn.
⁶ Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China. jian.yang@westlake.edu.cn.

^# Contributed equally.

PMID: 34737426
DOI: 10.1038/s41588-021-00954-4

Abstract

Compared with linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. In the present study, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool, fastGWA-GLMM, that is severalfold to orders of magnitude faster than the state-of-the-art tools when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. We show by simulation that the fastGWA-GLMM test statistics of both common and rare variants are well calibrated under the null, even for traits with extreme case-control ratios. We applied fastGWA-GLMM to the UKB data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin ), and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Aged
Algorithms*
Biological Specimen Banks* / statistics & numerical data
Case-Control Studies
Genetic Variation
Genome-Wide Association Study / statistics & numerical data
Genotype
Humans
Linear Models*
Middle Aged
Models, Genetic*
Phenotype
United Kingdom

Abstract

Publication types

MeSH terms

Grants and funding