Search and clustering orders of magnitude faster than BLAST
- PMID: 20709691
- DOI: 10.1093/bioinformatics/btq461
Search and clustering orders of magnitude faster than BLAST
Abstract
Motivation: Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification.
Results: UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets.
Availability: Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
Similar articles
-
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26. Bioinformatics. 2006. PMID: 16731699
-
kClust: fast and sensitive clustering of large protein sequence databases.BMC Bioinformatics. 2013 Aug 15;14:248. doi: 10.1186/1471-2105-14-248. BMC Bioinformatics. 2013. PMID: 23945046 Free PMC article.
-
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6. Bioinformatics. 2016. PMID: 26743509
-
Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.Bioinformatics. 2008 Jul 1;24(13):i41-9. doi: 10.1093/bioinformatics/btn174. Bioinformatics. 2008. PMID: 18586742 Free PMC article.
-
Clustered sequence representation for fast homology search.J Comput Biol. 2007 Jun;14(5):594-614. doi: 10.1089/cmb.2007.R005. J Comput Biol. 2007. PMID: 17683263 Review.
Cited by
-
Unlocking the Mycobacteroides abscessus pan-genome using computational tools: insights into evolutionary dynamics and lifestyle.Antonie Van Leeuwenhoek. 2024 Nov 23;118(1):30. doi: 10.1007/s10482-024-02042-z. Antonie Van Leeuwenhoek. 2024. PMID: 39579164
-
Differential stress responsiveness determines intraspecies virulence heterogeneity and host adaptation in Listeria monocytogenes.Nat Microbiol. 2024 Dec;9(12):3345-3361. doi: 10.1038/s41564-024-01859-8. Epub 2024 Nov 22. Nat Microbiol. 2024. PMID: 39578578
-
Influences of Community Coalescence on the Assembly of Bacterial Communities of the Small-Scale Complex Aquatic System from the Perspective of Bacterial Transmission, Core Taxa, and Co-occurrence Patterns.Microb Ecol. 2024 Nov 21;87(1):145. doi: 10.1007/s00248-024-02461-5. Microb Ecol. 2024. PMID: 39570409 Free PMC article.
-
Is there a correlation between TMAO plasma levels and archaea in the gut of patients undergoing hemodialysis?Int Urol Nephrol. 2024 Nov 19. doi: 10.1007/s11255-024-04273-5. Online ahead of print. Int Urol Nephrol. 2024. PMID: 39562414
-
Optimizing fountain codes for DNA data storage.Comput Struct Biotechnol J. 2024 Oct 26;23:3878-3896. doi: 10.1016/j.csbj.2024.10.038. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39559773 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
