Exploiting genome structure in association analysis

J Comput Biol. 2014 Apr;21(4):345-60. doi: 10.1089/cmb.2009.0224. Epub 2011 May 6.

Abstract

A genome-wide association study involves examining a large number of single-nucleotide polymorphisms (SNPs) to identify SNPs that are significantly associated with the given phenotype, while trying to reduce the false positive rate. Although haplotype-based association methods have been proposed to accommodate correlation information across nearby SNPs that are in linkage disequilibrium, none of these methods directly incorporated the structural information such as recombination events along chromosome. In this paper, we propose a new approach called stochastic block lasso for association mapping that exploits prior knowledge on linkage disequilibrium structure in the genome such as recombination rates and distances between adjacent SNPs in order to increase the power of detecting true associations while reducing false positives. Following a typical linear regression framework with the genotypes as inputs and the phenotype as output, our proposed method employs a sparsity-enforcing Laplacian prior for the regression coefficients, augmented by a first-order Markov process along the sequence of SNPs that incorporates the prior information on the linkage disequilibrium structure. The Markov-chain prior models the structural dependencies between a pair of adjacent SNPs, and allows us to look for association SNPs in a coupled manner, combining strength from multiple nearby SNPs. Our results on HapMap-simulated datasets and mouse datasets show that there is a significant advantage in incorporating the prior knowledge on linkage disequilibrium structure for marker identification under whole-genome association.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Computer Simulation
  • Genetic Markers
  • Genome
  • Genome-Wide Association Study / methods*
  • Haplotypes
  • Linear Models
  • Linkage Disequilibrium
  • Markov Chains
  • Mice
  • Models, Genetic*
  • Phenotype
  • Polymorphism, Single Nucleotide
  • ROC Curve
  • Signal-To-Noise Ratio
  • Stochastic Processes

Substances

  • Genetic Markers