Background: The technical skill of surgical trainees is not well assessed. This study aimed (1) to compare the reliability of three scoring systems, (2) to compare live and bench formats and (3) to assess construct validity of a test of operative skill.
Methods: Parallel examinations of operative skill, one using live animals and one using simulations, were developed. Performance was graded using operation-specific checklists, detailed global rating forms and pass/fail judgements. Twenty surgical residents each took both formats.
Results: Disattenuated correlations between live and bench scores were high (0.69-0.72). Mean interrater reliability across stations ranged from 0.64 to 0.72. Internal consistency was moderate to high (alpha: 0.61-0.74) for the live format using the checklist and for live and bench formats using global ratings. Global ratings discriminated between resident levels for both formats (bench: F(2,17) = 4.45, P < 0.05; live: F(2,17) = 3.55, P < 0.05), checklists did not.
Conclusion: This preliminary study suggests that the Objective Structured Assessment of Technical Skill can reliably and validly assess surgical skills. Global ratings are a better method of assessment than task-specific checklists. Bench model simulation gives equivalent results to use of live animals for this test format.