On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set

Ann Hum Genet. 2016 Mar;80(2):123-35. doi: 10.1111/ahg.12144. Epub 2016 Jan 12.

Abstract

The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P-values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT-O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P-value and have well-controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P-value of SKAT-O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large-scale sequencing studies. Given that more and more large-scale exome and whole genome sequencing or re-sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study.

Keywords: GWAS; SKAT; SKAT-O; sequencing data.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Atherosclerosis / genetics
  • Blood Glucose / analysis
  • Computer Simulation
  • Exome*
  • Genetic Association Studies / methods*
  • Genetic Variation*
  • Glucose-6-Phosphatase / genetics
  • Humans
  • Models, Genetic*
  • Software

Substances

  • Blood Glucose
  • Glucose-6-Phosphatase
  • G6PC2 protein, human