The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus

Am J Hum Genet. 2000 Aug;67(2):518-22. doi: 10.1086/303000. Epub 2000 Jun 19.


Haplotype analysis has become increasingly important for the study of human disease as well as for reconstruction of human population histories. Computer programs have been developed to estimate haplotype frequencies statistically from marker phenotypes in unrelated individuals. However, there currently are few empirical reports on the accuracy of statistical estimates that must infer linkage phase. We have analyzed haplotypes at the CD4 locus on chromosome 12 that consist of a short tandem-repeat polymorphism and an Alu insertion/deletion polymorphism located 9.8 kb apart, in 398 individuals from 10 geographically diverse sub-Saharan African populations. Haplotype frequency estimates obtained using gene counting based on molecularly haplotyped (phase-known) data were compared with haplotype frequency estimates obtained using the expectation-maximization algorithm. We show that the estimated frequencies of common haplotypes do not differ significantly with the use of phase-known versus phase-unknown data. However, rare haplotypes are occasionally miscalled when their presence/absence must be inferred. Thus, for those research questions for which the common haplotypes are most important, frequency estimates based on the phase-unknown marker-typing results from unrelated individuals will be sufficient. However, in cases where knowledge of rare haplotypes is critical, molecular haplotyping will be necessary to determine linkage phase unambiguously.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Africa South of the Sahara
  • Algorithms
  • Alu Elements / genetics
  • CD4 Antigens / genetics*
  • Chromosomes, Human, Pair 12 / genetics
  • Gene Frequency / genetics*
  • Genetic Markers / genetics
  • Haplotypes / genetics*
  • Heterozygote
  • Humans
  • Linkage Disequilibrium / genetics
  • Mutation / genetics
  • Polymorphism, Genetic / genetics
  • Research Design
  • Sensitivity and Specificity
  • Software
  • Statistics as Topic / methods*
  • Tandem Repeat Sequences / genetics


  • CD4 Antigens
  • Genetic Markers