Statistical phylogeography: methods of evaluating and minimizing inference errors

Mol Ecol. 2004 Apr;13(4):789-809. doi: 10.1046/j.1365-294x.2003.02041.x.

Abstract

Nested clade phylogeographical analysis (NCPA) has become a common tool in intraspecific phylogeography. To evaluate the validity of its inferences, NCPA was applied to actual data sets with 150 strong a priori expectations, the majority of which had not been analysed previously by NCPA. NCPA did well overall, but it sometimes failed to detect an expected event and less commonly resulted in a false positive. An examination of these errors suggested some alterations in the NCPA inference key, and these modifications reduce the incidence of false positives at the cost of a slight reduction in power. Moreover, NCPA does equally well in inferring events regardless of the presence or absence of other, unrelated events. A reanalysis of some recent computer simulations that are seemingly discordant with these results revealed that NCPA performed appropriately in these simulated samples and was not prone to a high rate of false positives under sampling assumptions that typify real data sets. NCPA makes a posteriori use of an explicit inference key for biological interpretation after statistical hypothesis testing. Alternatives to NCPA that claim that biological inference emerges directly from statistical testing are shown in fact to use an a priori inference key, albeit implicitly. It is argued that the a priori and a posteriori approaches to intraspecific phylogeography are complementary, not contradictory. Finally, cross-validation using multiple DNA regions is shown to be a powerful method of minimizing inference errors. A likelihood ratio hypothesis testing framework has been developed that allows testing of phylogeographical hypotheses, extends NCPA to testing specific hypotheses not within the formal inference key (such as the out-of-Africa replacement hypothesis of recent human evolution) and integrates intra- and interspecific phylogeographical inference.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • DNA / genetics*
  • Data Interpretation, Statistical
  • Evolution, Molecular*
  • Geography
  • Haplotypes / genetics
  • Likelihood Functions
  • Phylogeny*
  • Population Dynamics
  • Research Design*

Substances

  • DNA