Guidelines for standardizing the application of discriminant analysis of principal components to genotype data

Mol Ecol Resour. 2023 Apr;23(3):523-538. doi: 10.1111/1755-0998.13706. Epub 2022 Sep 7.

Abstract

Despite the popularity of discriminant analysis of principal components (DAPC) for studying population structure, there has been little discussion of best practice for this method. In this work, I provide guidelines for standardizing the application of DAPC to genotype data sets. An often overlooked fact is that DAPC generates a model describing genetic differences among a set of populations defined by a researcher. Appropriate parameterization of this model is critical for obtaining biologically meaningful results. I show that the number of leading PC axes used as predictors of among-population differences, paxes , should not exceed the k-1 biologically informative PC axes that are expected for k effective populations in a genotype data set. This k-1 criterion for paxes specification is more appropriate compared to the widely used proportional variance criterion, which often results in a choice of paxes ≫ k-1. DAPC parameterized with no more than the leading k-1 PC axes: (i) is more parsimonious; (ii) captures maximal among-population variation on biologically relevant predictors; (iii) is less sensitive to unintended interpretations of population structure; and (iv) is more generally applicable to independent sample sets. Assessing model fit should be routine practice and aids interpretation of population structure. It is imperative that researchers articulate their study goals, that is, testing a priori expectations vs. studying de novo inferred populations, because this has implications on how their DAPC results should be interpreted. The discussion and practical recommendations in this work provide the molecular ecology community with a roadmap for using DAPC in population genetic investigations.

Keywords: DAPC; assignment tests; coalescent simulations; multivariate statistics; population genetic methods; population structure.

MeSH terms

  • Discriminant Analysis*
  • Genotype