SNPs, haplotypes, and model selection in a candidate gene region: the SIMPle analysis for multilocus data

Genet Epidemiol. 2004 Dec;27(4):429-41. doi: 10.1002/gepi.20039.


Modern molecular techniques make discovery of numerous single nucleotide polymorphims (SNPs) in candidate gene regions feasible. Conventional analysis relies on either independent tests with each variant or the use of haplotypes in association analysis. The first technique ignores the dependencies between SNPs. The second, though it may increase power, often introduces uncertainty by estimating haplotypes from population data. Additionally, as the number of loci expands for a haplotype, ambiguity in interpretation increases for determining the underlying genetic components driving a detected association. Here, we present a genotype-level analysis to jointly model the SNPs via a SNP interaction model with phase information (SIMPle) to capture the underlying haplotype structure. This analysis estimates both the risk associated with each variant and the importance of phase between pairwise combinations of SNPs. Thus, rather than selecting between genotype- or haplotype-level approaches, the SIMPle method frames the analysis of multilocus data in a model selection paradigm, the aim to determine which SNPs, phase terms, and linear combinations best describe the relation between genetic variation and a trait of interest. To avoid unstable estimation due to sparse data and to incorporate both the dependencies among terms and the uncertainty in model selection, we propose a Bayes model averaging procedure. This highlights key SNPs and phase terms and yields a set of best representative models. Using simulations, we demonstrate the utility of the SIMPle model to identify crucial SNPs and underlying haplotype structures across a variety of causal models and genetic architectures.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bayes Theorem
  • Chromosome Mapping
  • Genetic Predisposition to Disease / epidemiology
  • Genetics, Population*
  • Genome, Human
  • Genotype
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic*
  • Models, Statistical
  • Polymorphism, Single Nucleotide / genetics*