Weighting checklist items and station components on a large-scale OSCE: is it worth the effort?

Debra Dallie Sandilands; Andrea Gotzmann; Marguerite Roy; Bruno D Zumbo; André De Champlain

doi:10.3109/0142159X.2014.899687

Weighting checklist items and station components on a large-scale OSCE: is it worth the effort?

Med Teach. 2014 Jul;36(7):585-90. doi: 10.3109/0142159X.2014.899687. Epub 2014 May 2.

Authors

Debra Dallie Sandilands¹, Andrea Gotzmann, Marguerite Roy, Bruno D Zumbo, André De Champlain

Affiliation

¹ University of British Columbia , Canada .

PMID: 24787530
DOI: 10.3109/0142159X.2014.899687

Abstract

Background: Past research suggests that the use of externally-applied scoring weights may not appreciably impact measurement qualities such as reliability or validity. Nonetheless, some credentialing boards and academic institutions apply differential scoring weights based on expert opinion about the relative importance of individual items or test components of Observed Structured Clinical Examinations (OSCEs).

Aims: To investigate the impact of simplified scoring models that make little to no use of differential weighting on the reliability of scores and decisions on a high stakes OSCE required for medical licensure in Canada.

Method: We applied four different weighting models of various complexities to data from three administrations of the OSCE. We compared score reliability, pass/fail rates, correlations between the scores and classification decision accuracy and consistency across the models and administrations.

Results: Less complex weighting models yielded similar reliability and pass rates as the more complex weighting model. Minimal changes in candidates' pass/fail status were observed and there were strong and statistically significant correlations between the scores for all scoring models and administrations. Classification decision accuracy and consistency were very high and similar across the four scoring models.

Conclusions: Adopting a simplified weighting scheme for this OSCE did not diminish its measurement qualities. Instead of developing complex weighting schemes, experts' time and effort could be better spent on other critical test development and assembly tasks with little to no compromise in the quality of scores and decisions on this high-stakes OSCE.

MeSH terms

Canada
Checklist
Clinical Competence / standards*
Educational Measurement / methods
Educational Measurement / standards*
Educational Measurement / statistics & numerical data
Humans
Licensure, Medical / standards*
Models, Educational
Reproducibility of Results