Background: It has recently been shown that fractionation, the random loss of excess gene copies after a whole genome duplication event, is a major cause of gene order disruption. When estimating evolutionary distances between genomes based on chromosomal rearrangement, fractionation inevitably leads to significant overestimation of classic rearrangement distances. This bias can be largely avoided when genomes are preprocessed by "consolidation", a procedure that identifies and accounts for regions of fractionation.
Results: In this paper, we present a new consolidation algorithm that extends and improves previous work in several directions. We extend the notion of the fractionation region to use information provided by regions where this process is still ongoing. The new algorithm can optionally work with this new definition of fractionation region and is able to process not only tetraploids but also genomes that have undergone hexaploidization and polyploidization events of higher order. Finally, this algorithm reduces the asymptotic time complexity of consolidation from quadratic to linear dependence on the genome size. The new algorithm is applied both to plant genomes and to simulated data to study the effect of fractionation in ancient hexaploids.