Performance of neural network basecalling tools for Oxford Nanopore sequencing
- PMID: 31234903
- PMCID: PMC6591954
- DOI: 10.1186/s13059-019-1727-y
Performance of neural network basecalling tools for Oxford Nanopore sequencing
Abstract
Background: Basecalling, the computational process of translating raw electrical signal to nucleotide sequence, is of critical importance to the sequencing platforms produced by Oxford Nanopore Technologies (ONT). Here, we examine the performance of different basecalling tools, looking at accuracy at the level of bases within individual reads and at majority-rule consensus basecalls in an assembly. We also investigate some additional aspects of basecalling: training using a taxon-specific dataset, using a larger neural network model and improving consensus basecalls in an assembly by additional signal-level analysis with Nanopolish.
Results: Training basecallers on taxon-specific data results in a significant boost in consensus accuracy, mostly due to the reduction of errors in methylation motifs. A larger neural network is able to improve both read and consensus accuracy, but at a cost to speed. Improving consensus sequences ('polishing') with Nanopolish somewhat negates the accuracy differences in basecallers, but pre-polish accuracy does have an effect on post-polish accuracy.
Conclusions: Basecalling accuracy has seen significant improvements over the last 2 years. The current version of ONT's Guppy basecaller performs well overall, with good accuracy and fast performance. If higher accuracy is required, users should consider producing a custom model using a larger neural network and/or training data from the same species.
Keywords: Basecalling; Long-read sequencing; Oxford Nanopore.
Conflict of interest statement
In July 2018, Ryan Wick attended a hackathon in Bermuda at ONT’s expense. ONT also paid his travel, accommodation and registration to attend the London Calling (2017) and Nanopore Community Meeting (2017) events as an invited speaker.
Figures
Similar articles
-
Species-specific basecallers improve actual accuracy of nanopore sequencing in plants.Plant Methods. 2022 Dec 14;18(1):137. doi: 10.1186/s13007-022-00971-2. Plant Methods. 2022. PMID: 36517904 Free PMC article.
-
NanoReviser: An Error-Correction Tool for Nanopore Sequencing Based on a Deep Learning Algorithm.Front Genet. 2020 Aug 12;11:900. doi: 10.3389/fgene.2020.00900. eCollection 2020. Front Genet. 2020. PMID: 32903372 Free PMC article.
-
SACall: A Neural Network Basecaller for Oxford Nanopore Sequencing Data Based on Self-Attention Mechanism.IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):614-623. doi: 10.1109/TCBB.2020.3039244. Epub 2022 Feb 3. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 33211664
-
Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017. Brief Bioinform. 2019. PMID: 29617724 Free PMC article. Review.
-
From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy.Genome Biol. 2018 Jul 13;19(1):90. doi: 10.1186/s13059-018-1462-9. Genome Biol. 2018. PMID: 30005597 Free PMC article. Review.
Cited by
-
High-resolution phylogenetic and population genetic analysis of microbial communities with RoC-ITS.ISME Commun. 2022 Oct 10;2(1):99. doi: 10.1038/s43705-022-00183-8. ISME Commun. 2022. PMID: 37938727 Free PMC article.
-
Get to Know Your Neighbors: Characterization of Close Bacillus anthracis Isolates and Toxin Profile Diversity in the Bacillus cereus Group.Microorganisms. 2023 Nov 7;11(11):2721. doi: 10.3390/microorganisms11112721. Microorganisms. 2023. PMID: 38004733 Free PMC article.
-
Expedited retrieval of high-quality Usutu virus genomes via Nanopore sequencing with and without target enrichment.Front Microbiol. 2022 Nov 9;13:1044316. doi: 10.3389/fmicb.2022.1044316. eCollection 2022. Front Microbiol. 2022. PMID: 36439823 Free PMC article.
-
Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality.bioRxiv [Preprint]. 2021 Apr 9:2021.04.07.438849. doi: 10.1101/2021.04.07.438849. bioRxiv. 2021. PMID: 33851158 Free PMC article. Preprint.
-
Transcriptome dataset of gall-rust infected Sengon (Falcataria falcata) seedlings using long-read PCR-cDNA sequencing.Data Brief. 2023 Dec 6;52:109919. doi: 10.1016/j.dib.2023.109919. eCollection 2024 Feb. Data Brief. 2023. PMID: 38093858 Free PMC article.
References
-
- Charalampous T, Richardson H, Kay GL, Baldan R, Jeanes C, Rae D, Grundy S, Turner DJ, Wain J, Leggett RM, Livermore DM, O’Grady J. Rapid diagnosis of lower respiratory infection using Nanopore-based clinical metagenomics. bioRxiv. 2018:387548. 10.1101/387548.
-
- Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML ’06 Proceedings of the 23rd International Conference on Machine Learning: 2006. p. 369–76. 10.1145/1143844.1143891. http://arxiv.org/abs/1607.03597.
-
- Stoiber M, Brown J. BasecRAWller: Streaming nanopore basecalling directly from raw signal. bioRxiv. 2017:1–15. 10.1101/133058.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
