The discrete Laplace exponential family and estimation of Y-STR haplotype frequencies

J Theor Biol. 2013 Jul 21:329:39-51. doi: 10.1016/j.jtbi.2013.03.009. Epub 2013 Mar 21.

Abstract

Estimating haplotype frequencies is important in e.g. forensic genetics, where the frequencies are needed to calculate the likelihood ratio for the evidential weight of a DNA profile found at a crime scene. Estimation is naturally based on a population model, motivating the investigation of the Fisher-Wright model of evolution for haploid lineage DNA markers. An exponential family (a class of probability distributions that is well understood in probability theory such that inference is easily made by using existing software) called the 'discrete Laplace distribution' is described. We illustrate how well the discrete Laplace distribution approximates a more complicated distribution that arises by investigating the well-known population genetic Fisher-Wright model of evolution by a single-step mutation process. It was shown how the discrete Laplace distribution can be used to estimate haplotype frequencies for haploid lineage DNA markers (such as Y-chromosomal short tandem repeats), which in turn can be used to assess the evidential weight of a DNA profile found at a crime scene. This was done by making inference in a mixture of multivariate, marginally independent, discrete Laplace distributions using the EM algorithm to estimate the probabilities of membership of a set of unobserved subpopulations. The discrete Laplace distribution can be used to estimate haplotype frequencies with lower prediction error than other existing estimators. Furthermore, the calculations could be performed on a normal computer. This method was implemented in the freely available open source software R that is supported on Linux, MacOS and MS Windows.

MeSH terms

  • Algorithms
  • Chromosomes, Human, Y / genetics*
  • Computer Simulation
  • Forensic Genetics / methods
  • Gene Frequency
  • Genetic Markers
  • Haplotypes / genetics*
  • Humans
  • Models, Genetic*
  • Tandem Repeat Sequences / genetics*

Substances

  • Genetic Markers