RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
- PMID: 22039206
- PMCID: PMC3244761
- DOI: 10.1093/bioinformatics/btr595
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
Abstract
Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.
Availability and implementation: Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/.
Contact: yye@indiana.edu
Supplementary information: Available at the RAPSearch2 website.
Similar articles
-
RAPSearch: a fast protein similarity search tool for short reads.BMC Bioinformatics. 2011 May 15;12:159. doi: 10.1186/1471-2105-12-159. BMC Bioinformatics. 2011. PMID: 21575167 Free PMC article.
-
SWORD-a highly efficient protein database search.Bioinformatics. 2016 Sep 1;32(17):i680-i684. doi: 10.1093/bioinformatics/btw445. Bioinformatics. 2016. PMID: 27587689
-
muBLASTP: database-indexed protein sequence search on multicore CPUs.BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4. BMC Bioinformatics. 2016. PMID: 27809763 Free PMC article.
-
Review of alignment and SNP calling algorithms for next-generation sequencing data.J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review.
-
Identifying local associations in biological time series: algorithms, statistical significance, and applications.Brief Bioinform. 2023 Sep 22;24(6):bbad390. doi: 10.1093/bib/bbad390. Brief Bioinform. 2023. PMID: 37930023 Review.
Cited by
-
PhIP-Seq: methods, applications and challenges.Front Bioinform. 2024 Sep 4;4:1424202. doi: 10.3389/fbinf.2024.1424202. eCollection 2024. Front Bioinform. 2024. PMID: 39295784 Free PMC article. Review.
-
The dynamics of the midgut microbiome in Aedes aegypti during digestion reveal putative symbionts.PNAS Nexus. 2024 Aug 1;3(8):pgae317. doi: 10.1093/pnasnexus/pgae317. eCollection 2024 Aug. PNAS Nexus. 2024. PMID: 39157462 Free PMC article.
-
A large-scale assessment of sequence database search tools for homology-based protein function prediction.Brief Bioinform. 2024 May 23;25(4):bbae349. doi: 10.1093/bib/bbae349. Brief Bioinform. 2024. PMID: 39038936 Free PMC article.
-
A survey of k-mer methods and applications in bioinformatics.Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38840832 Free PMC article. Review.
-
Species-level characterization of saliva and dental plaque microbiota reveals putative bacterial and functional biomarkers of periodontal diseases in dogs.FEMS Microbiol Ecol. 2024 May 14;100(6):fiae082. doi: 10.1093/femsec/fiae082. FEMS Microbiol Ecol. 2024. PMID: 38782729 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
