AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa

Alexandros Stamatakis; Alexander F Auch; Jan Meier-Kolthoff; Markus Göker

doi:10.1186/1471-2105-8-405

AxPcoords & parallel AxParafit: statistical co-phylogenetic analyses on thousands of taxa

BMC Bioinformatics. 2007 Oct 22:8:405. doi: 10.1186/1471-2105-8-405.

Authors

Alexandros Stamatakis¹, Alexander F Auch, Jan Meier-Kolthoff, Markus Göker

Affiliation

¹ Ecole Polytechnique Fédérale de Lausanne, School of Computer & Communication Sciences, Laboratory for Computational Biology and Bioinformatics STATION 14, CH-1015 Lausanne, Switzerland. Alexandros.Stamatakis@epfl.ch

Abstract

Background: Current tools for Co-phylogenetic analyses are not able to cope with the continuous accumulation of phylogenetic data. The sophisticated statistical test for host-parasite co-phylogenetic analyses implemented in Parafit does not allow it to handle large datasets in reasonable times. The Parafit and DistPCoA programs are the by far most compute-intensive components of the Parafit analysis pipeline. We present AxParafit and AxPcoords (Ax stands for Accelerated) which are highly optimized versions of Parafit and DistPCoA respectively.

Results: Both programs have been entirely re-written in C. Via optimization of the algorithm and the C code as well as integration of highly tuned BLAS and LAPACK methods AxParafit runs 5-61 times faster than Parafit with a lower memory footprint (up to 35% reduction) while the performance benefit increases with growing dataset size. The MPI-based parallel implementation of AxParafit shows good scalability on up to 128 processors, even on medium-sized datasets. The parallel analysis with AxParafit on 128 CPUs for a medium-sized dataset with an 512 by 512 association matrix is more than 1,200/128 times faster per processor than the sequential Parafit run. AxPcoords is 8-26 times faster than DistPCoA and numerically stable on large datasets. We outline the substantial benefits of using parallel AxParafit by example of a large-scale empirical study on smut fungi and their host plants. To the best of our knowledge, this study represents the largest co-phylogenetic analysis to date.

Conclusion: The highly efficient AxPcoords and AxParafit programs allow for large-scale co-phylogenetic analyses on several thousands of taxa for the first time. In addition, AxParafit and AxPcoords have been integrated into the easy-to-use CopyCat tool.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Base Sequence
Chromosome Mapping / methods*
Computer Simulation
Data Interpretation, Statistical
Evolution, Molecular*
Models, Genetic*
Models, Statistical
Molecular Sequence Data
Phylogeny
Sequence Analysis, DNA / methods*
Software*