With the proliferation of commercial experiment generators and custom software within cognitive psychology and the behavioral sciences, many have assumed that issues regarding millisecond timing accuracy have been largely solved. However, through empirical investigation of a variety of paradigms, we have discovered numerous sources of timing error. These can range from poor scripting practices, to incorrect timing specifications, to hardware variability. Building upon earlier research, we have developed a commercial device and associated software that enables researchers to benchmark most computer-based paradigms in situ and without modification. This gives them the opportunity to correct timing errors where practicable, increase replicability, and reduce variability by altering onset times for stimuli, by replacing inaccurate hardware, or by post hoc statistical manipulation should the source of error be constant. We outline the features of the device and accompanying software suite, stress the importance of such independent validation, and highlight typical areas that can be subject to error.