Finding haplotype block boundaries by using the minimum-description-length principle

Am J Hum Genet. 2003 Aug;73(2):336-54. doi: 10.1086/377106. Epub 2003 Jul 11.


We present a method for detecting haplotype blocks that simultaneously uses information about linkage-disequilibrium decay between the blocks and the diversity of haplotypes within the blocks. By use of phased single-nucleotide polymorphism data, our method partitions a chromosome into a series of adjacent, nonoverlapping blocks. The partition is made by choosing among a family of Markov models for block structure in a chromosomal region. Specifically, in the model, the occurrence of haplotypes within blocks follows a time-inhomogeneous Markov process along the chromosome, and we choose among possible partitions by using the two-stage minimum-description-length criterion. When applied to data simulated from the coalescent with recombination hotspots, our method reliably situates block boundaries at the hotspots and infrequently places block boundaries at sites with background levels of recombination. We apply three previously published block-finding methods to the same data, showing that they either are relatively insensitive to recombination hotspots or fail to discriminate between background sites of recombination and hotspots. When applied to the 5q31 data of Daly et al., our method identifies more block boundaries in agreement with those found by Daly et al. than do other methods. These results suggest that our method may be useful for designing association-based mapping studies that exploit haplotype blocks.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Chromosomes, Human, Pair 5 / genetics
  • DNA, Mitochondrial / genetics
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical*
  • Polymorphism, Single Nucleotide
  • Recombination, Genetic


  • DNA, Mitochondrial