Haplotypes are useful for both fine-mapping of susceptibility loci and evaluation of sequence variation at multiple sites along a chromosome. However, they are difficult to directly measure over long stretches of DNA in diploid organisms. Consequently, multiple genetic markers are typically measured, without linkage phase information, giving rise to a subject's diplotype. From diplotype data, haplotypes are often inferred by pedigree information, or treated as partially missing data when haplotype frequencies are estimated among unrelated subjects. This latter ambiguity can increase the variance of the estimated haplotype frequencies. Douglas et al. ( Nat. Genet. 28:361-364) recently quantified the relative efficiency of estimating haplotype frequencies from the diplotypes of unrelated subjects, relative to directly measured haplotypes via somatic cell hybrids (conversion technology), and demonstrated that unknown linkage phase can lead to a large loss of efficiency. However, their results were based on linkage equilibrium among marker loci, which may not be realistic for closely linked markers. We extend their relative efficiency calculations by several aspects: 1) allowance for linkage disequilbrium (LD) among marker loci; 2) evaluation of different patterns of LD; and 3) evaluation of nuclear families with and without parents. We show that although the loss in efficiency of haplotype frequencies among unrelated subjects decreases as LD increases to its maximum value, the general conclusions of Douglas et al. ( Nat. Genet. 28:361-364) hold true for a variety of LD patterns and magnitudes. However, our results also demonstrate that trios of parents+one child are highly efficient for haplotype frequency estimation, that additional children offer little information, and that siblings without parents can be grossly inefficient. Genet. Epidemiol. 23:426-443, 2002.
Copyright 2002 Wiley-Liss, Inc.