We describe a method for estimating species trees that relies on replicated subsampling of large data matrices. One application of this method is phylogeographic research, which has long depended on large datasets that sample intensively from the geographic range of the focal species; these datasets allow systematicists to identify cryptic diversity and understand how contemporary and historical landscape forces influence genetic diversity. However, analyzing any large dataset can be computationally difficult, particularly when newly developed methods for species tree estimation are used. Here we explore the use of replicated subsampling, a potential solution to the problem posed by large datasets, with both a simulation study and an empirical analysis. In the simulations, we sample different numbers of alleles and loci, estimate species trees using STEM, and compare the estimated to the actual species tree. Our results indicate that subsampling three alleles per species for eight loci nearly always results in an accurate species tree topology, even in cases where the species tree was characterized by extremely rapid divergence. Even more modest subsampling effort, for example one allele per species and two loci, was more likely than not (>50%) to identify the correct species tree topology, indicating that in nearly all cases, computing the majority-rule consensus tree from replicated subsampling provides a good estimate of topology. These results were supported by estimating the correct species tree topology and reasonable branch lengths for an empirical 10-locus great ape dataset.
Copyright © 2010 Elsevier Inc. All rights reserved.