Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Jul;73(1):86-94.
doi: 10.1086/376438. Epub 2003 May 20.

Minimum Description Length Block Finder, a Method to Identify Haplotype Blocks and to Compare the Strength of Block Boundaries

Affiliations
Free PMC article
Comparative Study

Minimum Description Length Block Finder, a Method to Identify Haplotype Blocks and to Compare the Strength of Block Boundaries

H Mannila et al. Am J Hum Genet. .
Free PMC article

Retraction in

Abstract

We describe a new probabilistic method for finding haplotype blocks that is based on the use of the minimum description length (MDL) principle. We give a rigorous definition of the quality of a segmentation of a genomic region into blocks and describe a dynamic programming algorithm for finding the optimal segmentation with respect to this measure. We also describe a method for finding the probability of a block boundary for each pair of adjacent markers: this gives a tool for evaluating the significance of each block boundary. We have applied the method to the published data of Daly and colleagues. The results expose some problems that exist in the current methods for the evaluation of the significance of predicted block boundaries. Our method, MDL block finder, can be used to compare block borders in different sample sets, and we demonstrate this by applying the MDL-based method to define the block structure in chromosomes from population isolates.

Figures

Figure  1
Figure 1
Haplotype block structure observed in the data of Daly et al. (2001). Marker number is given on the X-axis. a, The optimal block structure produced by the MDL scoring function, compared against the block boundaries reported by Daly et al. (2001). The numbers associated with the MDL blocks give the number of haplotype classes that suffice to cover at least 85% of the block and, in parentheses, the total number of the classes. The numbers associated with the Daly et al. (2001) blocks give the number of haplotypes in the block that suffice to cover at least 90% of the block. b, The log odds of the probability of block boundaries for each pair of adjacent markers. c, The optimal segmentation when k markers are allowed to be left outside the blocks, for varying k.
Figure  2
Figure 2
Haplotype block structure observed in the data of Daly et al. (2001) with added noise. The physical location of the marker is given on the X-axis. a, The block boundaries reported by Daly et al. (2001) and the optimal block structures produced by the MDL scoring function when 0%, 5%, and 10% of random noise was added to the data and when the order of markers was randomly permuted. b, The log odds of the probability of block boundaries for each pair of adjacent markers, after adding noise and permuting the order.
Figure  3
Figure 3
Haplotype block structure in the data from three subpopulations in Finland. Sample sizes are as follows: late settlement, n=108; early settlement, n=32; and subisolate of late settlement, n=108. The physical location of the markers are given on the X-axis. a, The optimal block structure produced by the MDL scoring function in the three subpopulations. The numbers refer to the number of haplotype classes covering at least 85% of haplotypes and, in parenthesis, the full block. b, The log odds of the probability of block boundaries for each pair of adjacent markers. c, The subisolate; the optimal segmentation when k markers are allowed to be left outside the blocks, for varying k.

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback