Hap10: reconstructing accurate and long polyploid haplotypes using linked reads

BMC Bioinformatics. 2020 Jun 18;21(1):253. doi: 10.1186/s12859-020-03584-5.

Abstract

Background: Haplotype information is essential for many genetic and genomic analyses, including genotype-phenotype associations in human, animals and plants. Haplotype assembly is a method for reconstructing haplotypes from DNA sequencing reads. By the advent of new sequencing technologies, new algorithms are needed to ensure long and accurate haplotypes. While a few linked-read haplotype assembly algorithms are available for diploid genomes, to the best of our knowledge, no algorithms have yet been proposed for polyploids specifically exploiting linked reads.

Results: The first haplotyping algorithm designed for linked reads generated from a polyploid genome is presented, built on a typical short-read haplotyping method, SDhaP. Using the input aligned reads and called variants, the haplotype-relevant information is extracted. Next, reads with the same barcodes are combined to produce molecule-specific fragments. Then, these fragments are clustered into strongly connected components which are then used as input of a haplotype assembly core in order to estimate accurate and long haplotypes.

Conclusions: Hap10 is a novel algorithm for haplotype assembly of polyploid genomes using linked reads. The performance of the algorithms is evaluated in a number of simulation scenarios and its applicability is demonstrated on a real dataset of sweet potato.

Keywords: 10X genomics; Clustering; Computational genetics; DNA sequence analysis; Haplotype; Linked read; Mathematical optimization; Polyploid genomes; Synthetic long reads.

MeSH terms

  • Algorithms
  • Genome, Human / genetics*
  • Haplotypes / physiology*
  • Humans
  • Polyploidy*