Incorporating external information to improve sparse signal detection in rare-variant gene-set-based analyses

Genet Epidemiol. 2020 Jun;44(4):330-338. doi: 10.1002/gepi.22283. Epub 2020 Feb 11.

Abstract

Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.

Keywords: amyotrophic lateral sclerosis; gene-set-based analysis; higher criticism; prior information; weighted p values.

MeSH terms

  • Amyotrophic Lateral Sclerosis / genetics*
  • Amyotrophic Lateral Sclerosis / pathology
  • Exome / genetics
  • Genetic Predisposition to Disease
  • Genetic Variation
  • Humans
  • NIMA-Related Kinase 1 / genetics
  • Superoxide Dismutase-1 / genetics
  • User-Computer Interface

Substances

  • SOD1 protein, human
  • Superoxide Dismutase-1
  • NEK1 protein, human
  • NIMA-Related Kinase 1