Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy

Kevin Liu; Serita Nelesen; Sindhu Raghavan; C Randal Linder; Tandy Warnow

doi:10.1109/TCBB.2008.63

Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy

IEEE/ACM Trans Comput Biol Bioinform. 2009 Jan-Mar;6(1):7-21. doi: 10.1109/TCBB.2008.63.

Authors

Kevin Liu¹, Serita Nelesen, Sindhu Raghavan, C Randal Linder, Tandy Warnow

Affiliation

¹ The University of Texas at Austin, Austin, TX 78712, USA. kliu@cs.utexas.edu

PMID: 19179695
DOI: 10.1109/TCBB.2008.63

Abstract

Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Computational Biology
Computer Simulation
Evolution, Molecular*
Markov Chains*
Models, Genetic
Models, Statistical
Phylogeny*
Sequence Alignment*
Software
Systems Biology / methods*