BACKGROUND: Many different test statistics have been proposed to test for spatial clustering. Some of these statistics have been widely used in various applications. In this paper, we use an existing collection of 1,220,000 simulated benchmark data, generated under 51 different clustering models, to compare the statistical power of several disease clustering tests. These tests are Besag-Newell's R, Cuzick-Edwards' k-Nearest Neighbors (k-NN), the spatial scan statistic, Tango's Maximized Excess Events Test (MEET), Swartz' entropy test, Whittemore's test, Moran's I and a modification of Moran's I. RESULTS: Except for Moran's I and Whittemore's test, all other tests have good power for detecting some kind of clustering. The spatial scan statistic is good at detecting localized clusters. Tango's MEET is good at detecting global clustering. With appropriate choice of parameter, Besag-Newell's R and Cuzick-Edwards' k-NN also perform well. CONCLUSION: The power varies greatly for different test statistics and alternative clustering models. Consideration of the power is important before we decide which test statistic to use.