Detecting SNP-expression associations: a comparison of mutual information and median test with standard statistical approaches

Stat Med. 2009 Dec 20;28(29):3581-96. doi: 10.1002/sim.3695.

Abstract

Single nucleotide polymorphism-gene expression associations have received increasing interest. The aim of these studies is discovering a difference in the location parameters of gene expressions given genotype. Because gene expressions often are highly skewed, heavy-tailed or data of different genotypes vary in dispersion, the median is the most appropriate measure of location. In this case, model assumptions of standard statistical methods for comparing locations such as the analysis of variance (ANOVA) or the Kruskal-Wallis (KW) test are violated. Alternatives that might be more appropriate are the median test (MED) and tests based on mutual information (MI). In simulation studies these approaches and a novel MI test are compared with ANOVA and KW. Location, dispersion and skewness parameters of the gene expression distributions given genotypes are varied as well as genotype frequencies. The MED test and the novel MI-based method keep the nominal significance levels for comparing medians if gene expression data are non-normally distributed. ANOVA and KW have substantially inflated type I errors. They are, however, optimal if standard model assumptions are fulfilled. The MED test generally has larger power than MI and is therefore recommended if model assumptions of standard procedures are violated. A 300 kb region on chromosome 9p21.3, which is associated with coronary artery disease, was analyzed using the HapMap data. Only the alternative approaches were able to identify three genes (ADM, FCGR3B and ADORA1) as promising candidates to clarify the molecular mechanism of the genetic association.

Publication types

  • Comparative Study

MeSH terms

  • Computer Simulation
  • Coronary Artery Disease / genetics
  • Cyclin-Dependent Kinase Inhibitor p15 / genetics
  • Cyclin-Dependent Kinase Inhibitor p16 / genetics
  • Gene Expression Profiling / methods*
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide / genetics*

Substances

  • Cyclin-Dependent Kinase Inhibitor p15
  • Cyclin-Dependent Kinase Inhibitor p16