Purpose: To explore methods to assess standardized patient (SP) performance over time so as to sustain objective performance in a high-stakes clinical skills examination and to inform quality assurance.
Method: The authors selected data from the United States Medical Licensing Examination-Step 2 Clinical Skills to assess the relative usefulness of the classical measurement and common factor models in determining the difficulty and discrimination of SP-medical case pairs (SP-cases) on communication scores over time. The common factor model is an alternative to the classical measurement model and can be used to calibrate SP-case parameters. The sample comprised 88 SP-case combinations in test administrations throughout the year 2010. The authors constructed four time segments from scoring cohorts; computed, for each method, difficulty and discrimination parameters for each SP-case within each time segment; and then compared the efficacy of each. They also compared qualitative SP-case performance standards established through video monitoring to the common factor model for relative usefulness in identifying SP-case outliers.
Results: SP-case difficulty parameters produced by the classical measurement and common factor models were similarly useful for SP performance evaluation over time. The SP-case discrimination parameters produced by the common factor model appeared to capture more variation in performance.
Conclusions: Although either method is equally useful for assessing SP-case difficulty, the common factor model is more sensitive to fluctuations in SP-case discrimination and could be an additionally useful source of information to identify outliers and to direct quality assurance resources for routine SP evaluation.