Although studies have been conducted to examine the effects of a variety of factors on the comparability of scores obtained from standardized patient examinations (SPE), little research has been conducted to specifically investigate the challenge of detecting drift in case difficulty estimates over time, particularly for large-scale, performance-based, assessments. The purpose of the current study was to investigate the use of a procedure to detect drift in the difficulty estimates for a large-scale, high stakes SPE. The results of this investigation suggest that, for particular performance tasks, there was some variation in mean scores over time. These findings indicate that, although it is feasible to create a bank of case-SP means and link scores back to these fixed estimates, special attention must be paid to the standardization of exam materials over time. This is essential to ensure comparability of scores and pass-fail decisions for candidates who are assessed on multiple test forms throughout the year.