Determination of probability distribution of diplotype configuration (diplotype distribution) for each subject from genotypic data using the EM algorithm

Ann Hum Genet. 2002 May;66(Pt 3):183-93. doi: 10.1017/S0003480002001124.


Haplotype analysis is important for mapping traits. Recently, methods for estimating haplotype frequencies from genotypes of unrelated individuals based on the expectation-maximization (EM) algorithm have been developed. Our program estimates haplotype frequencies in the population and determines the posterior probability distribution of diplotype configuration (diplotype distribution) for each subject based on the estimated haplotype frequencies. Samples from three ethnic groups for the smoothelin gene (SMTN) and those from three Japanese groups for serum amyloid A genes (SAA@) were analyzed. The estimated diplotype distribution for each individual was concentrated, in most cases, in a single diplotype configuration. The diplotype configuration thus determined was the same as that determined in in vitro experiments, with one exception. Thus, the diplotype configurations determined using the estimated haplotype frequencies from unrelated individuals are reliable. Using this method, the risk of a subject developing a phenotype may be estimated from the diplotype distribution when the phenotype is associated with diplotype configurations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Apolipoproteins / genetics
  • Base Sequence
  • Cytoskeletal Proteins / genetics
  • Haplotypes / genetics*
  • Humans
  • Molecular Sequence Data
  • Muscle Proteins / genetics
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA
  • Serum Amyloid A Protein / genetics


  • Apolipoproteins
  • Cytoskeletal Proteins
  • Muscle Proteins
  • SMTN protein, human
  • Serum Amyloid A Protein