SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
- PMID: 27706213
- PMCID: PMC5051824
- DOI: 10.1371/journal.pone.0163962
SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation
Abstract
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
BigSeqKit: a parallel Big Data toolkit to process FASTA and FASTQ files at scale.Gigascience. 2022 Dec 28;12:giad062. doi: 10.1093/gigascience/giad062. Epub 2023 Jul 31. Gigascience. 2022. PMID: 37522758 Free PMC article.
-
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST).BMC Bioinformatics. 2005 Apr 8;6:93. doi: 10.1186/1471-2105-6-93. BMC Bioinformatics. 2005. PMID: 15819992 Free PMC article.
-
Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files.Brief Bioinform. 2021 Jul 20;22(4):bbaa368. doi: 10.1093/bib/bbaa368. Brief Bioinform. 2021. PMID: 33341884
-
Visual BLAST and visual FASTA: graphic workbenches for interactive analysis of full BLAST and FASTA outputs under MICROSOFT WINDOWS 95/NT.Comput Appl Biosci. 1997 Aug;13(4):407-13. doi: 10.1093/bioinformatics/13.4.407. Comput Appl Biosci. 1997. PMID: 9283755
-
SEDA: A Desktop Tool Suite for FASTA Files Processing.IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1850-1860. doi: 10.1109/TCBB.2020.3040383. Epub 2022 Jun 3. IEEE/ACM Trans Comput Biol Bioinform. 2022. PMID: 33237866
Cited by
-
Seasonality of primary production explains the richness of pioneering benthic communities.Nat Commun. 2024 Sep 27;15(1):8340. doi: 10.1038/s41467-024-52673-z. Nat Commun. 2024. PMID: 39333524
-
Understanding species-specific and conserved RNA-protein interactions in vivo and in vitro.Nat Commun. 2024 Sep 27;15(1):8400. doi: 10.1038/s41467-024-52231-7. Nat Commun. 2024. PMID: 39333159
-
The Arabidopsis U1 snRNP regulates mRNA 3'-end processing.Nat Plants. 2024 Sep 23. doi: 10.1038/s41477-024-01796-8. Online ahead of print. Nat Plants. 2024. PMID: 39313562
-
Towards predicting the geographical origin of ancient samples with metagenomic data.Sci Rep. 2024 Sep 18;14(1):21794. doi: 10.1038/s41598-023-40246-x. Sci Rep. 2024. PMID: 39294129 Free PMC article.
-
Sampling fish gut microbiota - A genome-resolved metagenomic approach.Ecol Evol. 2024 Sep 17;14(9):e70302. doi: 10.1002/ece3.70302. eCollection 2024 Sep. Ecol Evol. 2024. PMID: 39290662 Free PMC article.
References
-
- Hester J. A collection of scripts developed to interact with fasta, fastq and sam/bam files. Available from: https://github.com/jimhester/fasta_utilities.
-
- FASTX-Toolkit, FASTQ/A short-reads pre-processing tools. Available from: http://hannonlab.cshl.edu/fastx_toolkit/.
-
- Shirley MD, Ma Z, Pedersen BS, Wheelan SJ. Efficient "pythonic" access to FASTA files using pyfaidx. PeerJ Preprints. 2015;3:e1196.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
