YAHA: fast and flexible long-read alignment with optimal breakpoint detection
- PMID: 22829624
- PMCID: PMC3463118
- DOI: 10.1093/bioinformatics/bts456
YAHA: fast and flexible long-read alignment with optimal breakpoint detection
Abstract
Motivation: With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this.
Results: We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints.
Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA.
Contact: imh4y@virginia.edu.
Figures
Similar articles
-
YOABS: yet other aligner of biological sequences--an efficient linearly scaling nucleotide aligner.Bioinformatics. 2012 Apr 15;28(8):1070-7. doi: 10.1093/bioinformatics/bts102. Epub 2012 Mar 7. Bioinformatics. 2012. PMID: 22402614
-
Label-guided seed-chain-extend alignment on annotated De Bruijn graphs.Bioinformatics. 2024 Jun 28;40(Suppl 1):i337-i346. doi: 10.1093/bioinformatics/btae226. Bioinformatics. 2024. PMID: 38940164 Free PMC article.
-
Fast and accurate long-read alignment with Burrows-Wheeler transform.Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15. Bioinformatics. 2010. PMID: 20080505 Free PMC article.
-
BatMis: a fast algorithm for k-mismatch mapping.Bioinformatics. 2012 Aug 15;28(16):2122-8. doi: 10.1093/bioinformatics/bts339. Epub 2012 Jun 10. Bioinformatics. 2012. PMID: 22689389
-
Technology dictates algorithms: recent developments in read alignment.Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review.
Cited by
-
kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph.Front Genet. 2022 May 5;13:890651. doi: 10.3389/fgene.2022.890651. eCollection 2022. Front Genet. 2022. PMID: 35601495 Free PMC article.
-
Combining probabilistic alignments with read pair information improves accuracy of split-alignments.Bioinformatics. 2018 Nov 1;34(21):3631-3637. doi: 10.1093/bioinformatics/bty398. Bioinformatics. 2018. PMID: 29790902 Free PMC article.
-
PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.BMC Bioinformatics. 2014 Jun 10;15:180. doi: 10.1186/1471-2105-15-180. BMC Bioinformatics. 2014. PMID: 24915764 Free PMC article.
-
Diverse, Biologically Relevant, and Targetable Gene Rearrangements in Triple-Negative Breast Cancer and Other Malignancies.Cancer Res. 2016 Aug 15;76(16):4850-60. doi: 10.1158/0008-5472.CAN-16-0058. Epub 2016 May 26. Cancer Res. 2016. PMID: 27231203 Free PMC article.
-
CtIP-mediated DNA resection is dispensable for IgH class switch recombination by alternative end-joining.Proc Natl Acad Sci U S A. 2020 Oct 13;117(41):25700-25711. doi: 10.1073/pnas.2010972117. Epub 2020 Sep 28. Proc Natl Acad Sci U S A. 2020. PMID: 32989150 Free PMC article.
References
-
- Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
-
- Bailey JA, et al. Recent segmental duplications in the human genome. Science. 2002;297:1003–1007. - PubMed
-
- Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. - PubMed
-
- Gotoh O. An improved algorithm for matching biological sequences. J. Mol. Biol. 1982;162:705–708. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
