Efficiency of PacBio long read correction by 2nd generation Illumina sequencing

Genomics. 2019 Jan;111(1):43-49. doi: 10.1016/j.ygeno.2017.12.011. Epub 2017 Dec 18.

Abstract

Long sequencing reads offer unprecedented opportunities in analysis and reconstruction of complex genomic regions. However, the gain in sequence length is often traded for quality. Therefore, recently several approaches have been proposed (e.g. higher sequencing coverage, hybrid assembly or sequence correction) to enhance the quality of long sequencing reads. A simple and cost-effective approach includes use of the high quality 2nd generation sequencing data to improve the quality of long reads. We designed a dedicated testing procedure and selected universal programs for long read correction, which provide as the output sequences that can be used in further genomic and transcriptomic studies. Our results show that HALC is the best choice for correction of long PacBio reads, when both, read size and quality, are the main focus of the analysis. However, the tested tools show some unexpected behaviors, including read trimming and fragmentation.

Keywords: Illumina; Long read sequencing; NGS sequencing; PacBio; Sequence correction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Databases, Genetic
  • Escherichia coli / genetics
  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Oryza / genetics
  • Sequence Analysis, DNA*
  • Trypanosoma / genetics
  • Yeasts / genetics