TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
- PMID: 31510677
- PMCID: PMC6612900
- DOI: 10.1093/bioinformatics/btz376
TideHunter: efficient and sensitive tandem repeat detection from noisy long-reads using seed-and-chain
Abstract
Motivation: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencing technologies can produce long-reads up to tens of kilobases, but with high error rates. In order to reduce sequencing error, Rolling Circle Amplification (RCA) has been used to improve library preparation by amplifying circularized template molecules. Linear products of the RCA contain multiple tandem copies of the template molecule. By integrating additional in silico processing steps, these tandem sequences can be collapsed into a consensus sequence with a higher accuracy than the original raw reads. Existing pipelines using alignment-based methods to discover the tandem repeat patterns from the long-reads are either inefficient or lack sensitivity.
Results: We present a novel tandem repeat detection and consensus calling tool, TideHunter, to efficiently discover tandem repeat patterns and generate high-quality consensus sequences from amplified tandemly repeated long-read sequencing data. TideHunter works with noisy long-reads (PacBio and ONT) at error rates of up to 20% and does not have any limitation of the maximal repeat pattern size. We benchmarked TideHunter using simulated and real datasets with varying error rates and repeat pattern sizes. TideHunter is tens of times faster than state-of-the-art methods and has a higher sensitivity and accuracy.
Availability and implementation: TideHunter is written in C, it is open source and is available at https://github.com/yangao07/TideHunter.
© The Author(s) 2019. Published by Oxford University Press.
Figures
Similar articles
-
Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.Bioinformatics. 2019 Nov 1;35(22):4809-4811. doi: 10.1093/bioinformatics/btz484. Bioinformatics. 2019. PMID: 31290946 Free PMC article.
-
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521. Bioinformatics. 2018. PMID: 29955770
-
lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data.Bioinformatics. 2019 Jan 1;35(1):20-27. doi: 10.1093/bioinformatics/bty544. Bioinformatics. 2019. PMID: 30561550 Free PMC article.
-
Evaluation of tools for long read RNA-seq splice-aware alignment.Bioinformatics. 2018 Mar 1;34(5):748-754. doi: 10.1093/bioinformatics/btx668. Bioinformatics. 2018. PMID: 29069314 Free PMC article.
-
Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art.Bioinformatics. 2017 Apr 15;33(8):1261-1270. doi: 10.1093/bioinformatics/btw811. Bioinformatics. 2017. PMID: 28003261 Free PMC article. Review.
Cited by
-
A chromosome-scale reference genome of grasspea (Lathyrus sativus).Sci Data. 2024 Sep 27;11(1):1035. doi: 10.1038/s41597-024-03868-y. Sci Data. 2024. PMID: 39333203 Free PMC article.
-
A comparison of Oxford nanopore library strategies for bacterial genomics.BMC Genomics. 2023 Oct 20;24(1):627. doi: 10.1186/s12864-023-09729-z. BMC Genomics. 2023. PMID: 37864145 Free PMC article.
-
The formation and propagation of human Robertsonian chromosomes.bioRxiv [Preprint]. 2024 Sep 26:2024.09.24.614821. doi: 10.1101/2024.09.24.614821. bioRxiv. 2024. PMID: 39386535 Free PMC article. Preprint.
-
Composition and Structure of Arabidopsis thaliana Extrachromosomal Circular DNAs Revealed by Nanopore Sequencing.Plants (Basel). 2023 May 30;12(11):2178. doi: 10.3390/plants12112178. Plants (Basel). 2023. PMID: 37299157 Free PMC article.
-
Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins.Nat Commun. 2024 Aug 1;15(1):6297. doi: 10.1038/s41467-024-50239-7. Nat Commun. 2024. PMID: 39090106 Free PMC article.
