A comparison of supermatrix and supertree methods for multilocus phylogenetics using organismal datasets

Cladistics. 2013 Oct;29(5):560-566. doi: 10.1111/cla.12014. Epub 2013 Feb 18.

Abstract

It has been proposed that supertree approaches should be applied to large multilocus datasets to achieve computational tractability. Large datasets such as those derived from phylogenomics studies can be broken into many locus-specific tree searches and the resulting trees can be stitched together via a supertree method. Using simulated data, workers have reported that they can rapidly construct a supertree that is comparable to the results of heuristic tree search on the entire dataset. To test this assertion with organismal data, we compare tree length under the parsimony criterion and computational time for 20 multilocus datasets using supertree (SuperFine and SuperTriplets) and supermatrix (heuristic search in TNT) approaches. Tree length and computational times were compared among methods using the Wilcoxon matched-pairs signed rank test. Supermatrix searches produced significantly shorter trees than either supertree approach (SuperFine or SuperTriplets; P < 0.0002 in both cases). Moreover, the processing time of supermatrix search was significantly lower than SuperFine+locus-specific search (P < 0.01) but roughly equivalent to that of SuperTriplets+locus-specific search (P > 0.4, not significant). In conclusion, we show by using real rather than simulated data that there is no basis, either in time tractability or in tree length, for use of supertrees over heuristic tree search using a supermatrix for phylogenomics.