Inferring the most likely geographical origin of mtDNA sequence profiles

Ann Hum Genet. 2004 Sep;68(Pt 5):461-71. doi: 10.1046/j.1529-8817.2004.00109.x.


In a number of practical cases it is important to determine the likely geographical origin of an individual or a biological sample. A dead body, old bones or a sample of semen may be available. Information on where the sample might come from can assist investigation or research. The first part of this paper is independent of specific data structure. We formulate the problem as a classification problem. Bayes' theorem allows different sources of information or data to be reconciled conveniently. The main part of the paper involves high dimensional data for which simple, standard methods are not likely to work properly. Mitochondrial DNA (mtDNA) data is a typical example of such data. We propose a procedure involving essentially two steps. First, principal component analysis is used to reduce the dimension of the data. Next, quadratic discriminant analysis performs the actual classification. A cross validation procedure is implemented to select the optimal number of principal components. The importance of using separate data sets for model fitting and testing is emphasized. This method distinguishes well between individuals with a self reported European (Icelandic or German) origin and SE Africans. In this case the error rate is 2.0%.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Africa
  • Anthropology, Physical
  • Bayes Theorem
  • DNA, Mitochondrial / genetics*
  • Discriminant Analysis
  • Europe
  • Forensic Medicine
  • Genetics, Population*
  • Geography*
  • Humans
  • Models, Theoretical*
  • Population Dynamics*


  • DNA, Mitochondrial