Purpose: To compare procedure-specific checklists and a global rating scale in assessing technical competence.
Method: Two trained raters used procedure-specific checklists and a global rating scale to independently evaluate 218 video-recorded performances of six bedside procedures of varying complexity for technical competence. The procedures were completed by 47 residents participating in a formative simulation-based objective structured clinical examination at the University of Calgary in 2011. Pass/fail (competent/not competent) decisions were based on an overall global assessment item on the global rating scale. Raters provided written comments on performances they deemed not competent. Checklist minimum passing levels were set using traditional standard-setting methods.
Results: For each procedure, the global rating scale demonstrated higher internal reliability and lower interrater reliability than the checklist. However, interrater reliability was almost perfect for decisions on competence using the overall global assessment (Kappa range: 0.84-1.00). Clinically significant procedural errors were most often cited as reasons for ratings of not competent. Using checklist scores to diagnose competence demonstrated acceptable discrimination: The area under the curve ranged from 0.84 (95% CI 0.72-0.97) to 0.93 (95% CI 0.82-1.00). Checklist minimum passing levels demonstrated high sensitivity but low specificity for diagnosing competence.
Conclusions: Assessment using a global rating scale may be superior to assessment using a checklist for evaluation of technical competence. Traditional standard-setting methods may establish checklist cut scores with too-low specificity: High checklist scores did not rule out incompetence. The role of clinically significant errors in determining procedural competence should be further evaluated.