Assessment of the parallelization approach of d2_cluster for high-performance sequence clustering

J Comput Chem. 2002 May;23(7):755-7. doi: 10.1002/jcc.10025.

Abstract

The exponential increase in expressed sequence tag (EST) sequence data amplifies the computational cost of clustering sequences such that new algorithms are required to analyze data at a greater rate. We have parallelized d2_cluster on a SGI Origin 2000 multiprocessor and observed a speedup of approximately 100x on 126 processors when processing a 15,876 EST dataset. The parallelized d2_cluster code is obtainable from the SANBI website (http://www.sanbi.ac.za/CODES).