The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways and calculating marginal STR distributions. We found that the Y-STR haplotypes from e.g. Finland were relatively homogeneous as opposed to the relatively heterogeneous Y-STR haplotypes from e.g. Lublin, Eastern Poland and Berlin, Germany. We demonstrated that the observed distributions of alleles at each locus were similar to the expected ones. We also compared pairwise distances between geographically separated samples from Africa with those obtained using the AMOVA method and found good agreement.
Keywords: AMOVA; Haplotype frequency; Population homogeneity; Population substructure; Subpopulation.
Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.