Identifying crash "hotspots", "blackspots", "sites with promise", or "high risk" locations is standard practice in departments of transportation throughout the US. The literature is replete with the development and discussion of statistical methods for hotspot identification (HSID). Theoretical derivations and empirical studies have been used to weigh the benefits of various HSID methods; however, a small number of studies have used controlled experiments to systematically assess various methods. Using experimentally derived simulated data--which are argued to be superior to empirical data, three hot spot identification methods observed in practice are evaluated: simple ranking, confidence interval, and Empirical Bayes. Using simulated data, sites with promise are known a priori, in contrast to empirical data where high risk sites are not known for certain. To conduct the evaluation, properties of observed crash data are used to generate simulated crash frequency distributions at hypothetical sites. A variety of factors is manipulated to simulate a host of 'real world' conditions. Various levels of confidence are explored, and false positives (identifying a safe site as high risk) and false negatives (identifying a high risk site as safe) are compared across methods. Finally, the effects of crash history duration in the three HSID approaches are assessed. The results illustrate that the Empirical Bayes technique significantly outperforms ranking and confidence interval techniques (with certain caveats). As found by others, false positives and negatives are inversely related. Three years of crash history appears, in general, to provide an appropriate crash history duration.