On the rank-distance median of 3 permutations

BMC Bioinformatics. 2018 May 8;19(Suppl 6):142. doi: 10.1186/s12859-018-2131-4.

Abstract

Background: Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper. In their paper, the authors provide an [Formula: see text] algorithm for determining three candidate medians, prove the tight approximation ratio [Formula: see text], and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data.

Results: In this paper, we extend their results and provide the following: Three invariants characterizing the problem of finding the median of 3 matrices A sufficient condition for uniqueness of medians that can be checked in O(n) A faster, [Formula: see text] algorithm for determining the median under this condition A new heuristic algorithm for this problem based on compressed sensing A [Formula: see text] algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.

Conclusions: Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.

Keywords: Genome rearrangements; Median problem; Polynomial-time solvability.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Databases, Genetic
  • Gene Rearrangement
  • Genome
  • Genomics / methods
  • Models, Genetic*
  • Mutation / genetics

Grants and funding