Scaling read aligners to hundreds of threads on general-purpose processors
- PMID: 30020410
- PMCID: PMC6361242
- DOI: 10.1093/bioinformatics/bty648
Scaling read aligners to hundreds of threads on general-purpose processors
Abstract
Motivation: General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners.
Results: We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling.
Availability and implementation: Experiments for this study: https://github.com/BenLangmead/bowtie-scaling.
Bowtie: http://bowtie-bio.sourceforge.net.
Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2.
Hisat: http://www.ccb.jhu.edu/software/hisat.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Similar articles
-
Vargas: heuristic-free alignment for assessing linear and graph read aligners.Bioinformatics. 2020 Jun 1;36(12):3712-3718. doi: 10.1093/bioinformatics/btaa265. Bioinformatics. 2020. PMID: 32321164 Free PMC article.
-
FastqCLS: a FASTQ compressor for long-read sequencing via read reordering using a novel scoring model.Bioinformatics. 2022 Jan 3;38(2):351-356. doi: 10.1093/bioinformatics/btab696. Bioinformatics. 2022. PMID: 34623374
-
MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).BMC Bioinformatics. 2015;16 Suppl 7(Suppl 7):S10. doi: 10.1186/1471-2105-16-S7-S10. Epub 2015 Apr 23. BMC Bioinformatics. 2015. PMID: 25952019 Free PMC article.
-
Arioc: GPU-accelerated alignment of short bisulfite-treated reads.Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167. Bioinformatics. 2018. PMID: 29554207 Free PMC article.
-
The Bowtie diagram: a simple tool for analysis and planning in anesthesia.Curr Opin Anaesthesiol. 2020 Dec;33(6):808-814. doi: 10.1097/ACO.0000000000000926. Curr Opin Anaesthesiol. 2020. PMID: 33044235 Free PMC article. Review.
Cited by
-
Characterization of gut microbiota dynamics in an Alzheimer's disease mouse model through clade-specific marker-based analysis of shotgun metagenomic data.Biol Direct. 2024 Oct 30;19(1):100. doi: 10.1186/s13062-024-00541-7. Biol Direct. 2024. PMID: 39478626 Free PMC article.
-
DNA methylation shapes the Polycomb landscape during the exit from naive pluripotency.Nat Struct Mol Biol. 2024 Oct 24. doi: 10.1038/s41594-024-01405-4. Online ahead of print. Nat Struct Mol Biol. 2024. PMID: 39448850
-
Metagenomic analysis reveals high diversity of gut viromes in yaks (Bos grunniens) from the Qinghai-Tibet Plateau.Commun Biol. 2024 Sep 6;7(1):1097. doi: 10.1038/s42003-024-06798-y. Commun Biol. 2024. PMID: 39242698 Free PMC article.
-
Nucleosome Patterns in Circulating Tumor DNA Reveal Transcriptional Regulation of Advanced Prostate Cancer Phenotypes.Cancer Discov. 2023 Mar 1;13(3):632-653. doi: 10.1158/2159-8290.CD-22-0692. Cancer Discov. 2023. PMID: 36399432 Free PMC article.
-
De Novo Transcriptome Meta-Assembly of the Mixotrophic Freshwater Microalga Euglena gracilis.Genes (Basel). 2021 May 29;12(6):842. doi: 10.3390/genes12060842. Genes (Basel). 2021. PMID: 34072576 Free PMC article.
References
-
- Aldinucci M. et al. (2017) Fastflow: high-level and efficient streaming on multi-core In: Pllana S., Xhafa F. (eds) Programming Multi-Core and Many-Core Computing Systems, Parallel and Distributed Computing. John Wiley & Sons, p. 528.
-
- Anderson T.E. (1990) The performance of spin lock alternatives for shared-money multiprocessors. IEEE Trans Parallel Distributed Systems, 1, 6–16.
-
- Blumofe R.D. et al. (1995) Cilk: An Efficient Multithreaded Runtime System. In: PPOPP '95 Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Santa Barbara, California, USA, Vol.30, pp. 207–216. ACM, New York, NY, USA.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
