Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 17 (3), 282-3

Clustering of Highly Homologous Sequences to Reduce the Size of Large Protein Databases

Affiliations

Clustering of Highly Homologous Sequences to Reduce the Size of Large Protein Databases

W Li et al. Bioinformatics.

Abstract

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

Similar articles

See all similar articles

Cited by 262 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback