On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set

Baolin Wu; Weihua Guan; James S Pankow

doi:10.1111/ahg.12144

On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set

Ann Hum Genet. 2016 Mar;80(2):123-35. doi: 10.1111/ahg.12144. Epub 2016 Jan 12.

Authors

Baolin Wu¹, Weihua Guan¹, James S Pankow²

Affiliations

¹ Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA.
² Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA.

Abstract

The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P-values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT-O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P-value and have well-controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P-value of SKAT-O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large-scale sequencing studies. Given that more and more large-scale exome and whole genome sequencing or re-sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study.

Keywords: GWAS; SKAT; SKAT-O; sequencing data.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Atherosclerosis / genetics
Blood Glucose / analysis
Computer Simulation
Exome*
Genetic Association Studies / methods*
Genetic Variation*
Glucose-6-Phosphatase / genetics
Humans
Models, Genetic*
Software

Substances

Blood Glucose
Glucose-6-Phosphatase
G6PC2 protein, human

Abstract

Publication types

MeSH terms

Substances

Grants and funding