flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning

J Comput Biol. 2022 Feb;29(2):195-211. doi: 10.1089/cmb.2021.0436. Epub 2022 Jan 17.

Abstract

Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid Solanum tuberosum (potato).

Keywords: UPEM; haplotype phasing; long-reads; polyploid.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data
  • Genome, Plant
  • Haplotypes*
  • Models, Genetic
  • Models, Statistical
  • Multigene Family
  • Polymorphism, Single Nucleotide
  • Polyploidy*
  • Sequence Analysis, DNA / statistics & numerical data
  • Software
  • Solanum tuberosum / genetics