Cophenetic Median Trees

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1459-1470. doi: 10.1109/TCBB.2018.2870173. Epub 2018 Sep 13.

Abstract

Median tree inference under path-difference metrics has shown great promise for large-scale phylogeny estimation. Similar to these metrics is the family of cophenetic metrics that originates from a classic dendrogram comparison method introduced more than 50 years ago. Despite the appeal of this family of metrics, the problem of computing median trees under cophenetic metrics has not been analyzed. Like other standard median tree problems relevant in practice, as we show here, this problem is also NP-hard. NP-hard median tree problems have been successfully addressed by local search heuristics that are solving thousands of instances of a corresponding (local neighborhood) search problem. For the local neighborhood search problem under a cophenetic metric, the best known (naïve) algorithm has a time complexity that is typically prohibitive for effective heuristic searches. Building on the pioneering work on path-difference median trees, we develop efficient algorithms for Manhattan and Euclidean cophenetic search problems that improve on the naïve solution by a linear and a quadratic factor, respectively. We demonstrate the performance and effectiveness of the resulting heuristic methods in a comparative study using benchmark empirical datasets.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic
  • Genetic Techniques
  • Heuristics
  • Models, Genetic*
  • Phylogeny*