Hybrid error correction and de novo assembly of single-molecule sequencing reads
- PMID: 22750884
- PMCID: PMC3707490
- DOI: 10.1038/nbt.2280
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Abstract
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Conflict of interest statement
Figures
Similar articles
-
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9. BMC Genomics. 2019. PMID: 31856721 Free PMC article.
-
A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon.Front Genet. 2021 Apr 27;12:656334. doi: 10.3389/fgene.2021.656334. eCollection 2021. Front Genet. 2021. PMID: 33986770 Free PMC article.
-
A comparison of next generation sequencing technologies for transcriptome assembly and utility for RNA-Seq in a non-model bird.PLoS One. 2014 Oct 3;9(10):e108550. doi: 10.1371/journal.pone.0108550. eCollection 2014. PLoS One. 2014. PMID: 25279728 Free PMC article.
-
PacBio Sequencing and Its Applications.Genomics Proteomics Bioinformatics. 2015 Oct;13(5):278-89. doi: 10.1016/j.gpb.2015.08.002. Epub 2015 Nov 2. Genomics Proteomics Bioinformatics. 2015. PMID: 26542840 Free PMC article. Review.
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
-
Application of Next Generation Sequencing in Laboratory Medicine.Ann Lab Med. 2021 Jan;41(1):25-43. doi: 10.3343/alm.2021.41.1.25. Epub 2020 Aug 25. Ann Lab Med. 2021. PMID: 32829577 Free PMC article. Review.
-
Comparative Genomic Analysis of Carbofuran-Degrading Sphingomonads Reveals the Carbofuran Catabolism Mechanism in Sphingobium sp. Strain CFD-1.Appl Environ Microbiol. 2022 Nov 22;88(22):e0102422. doi: 10.1128/aem.01024-22. Epub 2022 Oct 31. Appl Environ Microbiol. 2022. PMID: 36314801 Free PMC article.
-
A unique chromatin complex occupies young α-satellite arrays of human centromeres.Sci Adv. 2015 Feb 12;1(1):e1400234. doi: 10.1126/sciadv.1400234. Sci Adv. 2015. PMID: 25927077 Free PMC article.
-
Genomic Analysis and Molecular Characteristics in Carbapenem-Resistant Klebsiella pneumoniae Strains.Curr Microbiol. 2022 Nov 3;79(12):391. doi: 10.1007/s00284-022-03093-z. Curr Microbiol. 2022. PMID: 36329291
-
Oxford Nanopore MinION Sequencing and Genome Assembly.Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279. doi: 10.1016/j.gpb.2016.05.004. Epub 2016 Sep 17. Genomics Proteomics Bioinformatics. 2016. PMID: 27646134 Free PMC article. Review.
References
-
- Bentley D. Whole-genome re-sequencing. Current Opinion in Genetics & Development. 2006;16:545–552. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
