Polypolish: Short-read polishing of long-read bacterial genome assemblies

PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. eCollection 2022 Jan.

Abstract

Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Bacterial / genetics
  • Genome, Bacterial / genetics*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Repetitive Sequences, Nucleic Acid / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Bacterial

Grants and funding

This work was supported, in whole or in part, by the Bill & Melinda Gates Foundation (KEH, grant number OPP1175797). Under the grant conditions of the Foundation, a Creative Commons Attribution 4.0 Generic License has already been assigned to the Author Accepted Manuscript version that might arise from this submission. This work was also supported by an Australian Government Research Training Program Scholarship (RRW), and a Senior Medical Research Fellowship from the Sylvia and Charles Viertel Charitable Foundation (KEH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.