Methods of varying complexity have been proposed to efficiently estimate haplotype relative risks in case-control data. Our goal was to compare methods that estimate associations between disease conditions and common haplotypes in large case-control studies such that haplotype imputation is done once as a simple data-processing step. We performed a simulation study based on haplotype frequencies for two renin-angiotensin system genes. The iterative and noniterative methods we compared involved fitting a weighted logistic regression, but differed in how the probability weights were specified. We also quantified the amount of ambiguity in the simulated genes. For one gene, there was essentially no uncertainty in the imputed diplotypes and every method performed well. For the other, approximately 60% of individuals had an unambiguous diplotype, and approximately 90% had a highest posterior probability greater than 0.75. For this gene, all methods performed well under no genetic effects, moderate effects, and strong effects tagged by a single nucleotide polymorphism (SNP). Noniterative methods produced biased estimates under strong effects not tagged by an SNP. For the most likely diplotype, median bias of the log-relative risks ranged between -0.49 and 0.22 over all haplotypes. For all possible diplotypes, median bias ranged between -0.73 and 0.08. Results were similar under interaction with a binary covariate. Noniterative weighted logistic regression provides valid tests for genetic associations and reliable estimates of modest effects of common haplotypes, and can be implemented in standard software. The potential for phase ambiguity does not necessarily imply uncertainty in imputed diplotypes, especially in large studies of common haplotypes.
Copyright (c) 2006 Wiley-Liss, Inc.