A new algorithm for haplotype-based association analysis: the Stochastic-EM algorithm

Ann Hum Genet. 2004 Mar;68(Pt 2):165-77. doi: 10.1046/j.1529-8817.2003.00085.x.


It is now widely accepted that haplotypic information can be of great interest for investigating the role of a candidate gene in the etiology of complex diseases. In the absence of family data, haplotypes cannot be deduced from genotypes, except for individuals who are homozygous at all loci or heterozygous at only one site. Statistical methodologies are therefore required for inferring haplotypes from genotypic data and testing their association with a phenotype of interest. Two maximum likelihood algorithms are often used in the context of haplotype-based association studies, the Newton-Raphson (NR) and the Expectation-Maximisation (EM) algorithms. In order to circumvent the limitations of both algorithms, including convergence to local minima and saddle points, we here described how a stochastic version of the EM algorithm, referred to as SEM, could be used for testing haplotype-phenotype association. Statistical properties of the SEM algorithm were investigated through a simulation study for a large range of practical situations, including small/large samples and rare/frequent haplotypes, and results were compared to those obtained by use of the standard NR algorithm. Our simulation study indicated that the SEM algorithm provides results similar to those of the NR algorithm, making the SEM algorithm of great interest for haplotype-based association analysis, especially when the number of polymorphisms is quite large.

MeSH terms

  • Algorithms*
  • Haplotypes*
  • Humans
  • Likelihood Functions*
  • Models, Genetic
  • Models, Statistical
  • Polymorphism, Genetic*
  • Stochastic Processes