When the safety of the public is at stake, it is particularly relevant for licensing and credentialing exam agencies to use defensible standard setting methods to categorize candidates into competence categories (e.g., pass/fail). The aim of this study was to gather evidence to support change to the Comprehensive Osteopathic Medical Licensing-USA Level 2-Performance Evaluation standard setting design and administrative process. Twenty-two video recordings of candidates assessed for clinical competence were randomly selected from the 2014-2015 Humanistic domain test score distribution ranging from the highest to lowest quintile of performance. Nineteen panelists convened at the same site to receive training and practice prior to generating judgments of qualified or not qualified performance to each of the twenty videos. At the end of training, one panel remained onsite to complete their judgments and the second panel was released and given 1 week to observe the same twenty videos and complete their judgments offsite. The two one-sided test procedure established equivalence between panel group means at the 0.05 confidence level, controlling for rater errors within each panel group. From a practical cost-effective and administrative resource perspective, results from this study suggest it is possible to diverge from typical panel groups, who are sequestered the entire time onsite, to larger numbers of panelists who can make their judgments offsite with little impact on judged samples of qualified performance. Standard setting designs having panelists train together and then allowing those to provide judgments yields equivalent ratings and, ultimately, similar cut scores.
Keywords: Clinical skills; Equivalence test; Many-facet Rasch measurement; Standard setting.