The SCJ Small Parsimony Problem for Weighted Gene Adjacencies

IEEE/ACM Trans Comput Biol Bioinform. Jul-Aug 2019;16(4):1364-1373. doi: 10.1109/TCBB.2017.2661761. Epub 2017 Jan 31.

Abstract

Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Biological Evolution
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Genetic
  • Evolution, Molecular
  • Gene Order
  • Genetic Markers / genetics
  • Genome, Bacterial*
  • Genomics / methods*
  • Models, Genetic
  • Opossums / genetics
  • Phylogeny
  • Plasmids / metabolism
  • Probability
  • Reproducibility of Results
  • Swine / genetics
  • Yersinia / genetics

Substances

  • Genetic Markers