Complex genome assembly based on long-read sequencing

Brief Bioinform. 2022 Sep 20;23(5):bbac305. doi: 10.1093/bib/bbac305.

Abstract

High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.

Keywords: genome assembly; haplotype; long-read sequencing.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosome Mapping
  • Genome
  • Genomics*
  • High-Throughput Nucleotide Sequencing* / methods
  • Sequence Analysis, DNA / methods