Background: Crowdsourcing is the practice of obtaining services from a large group of people, typically an online community. Validated methods of evaluating surgical video are time-intensive, expensive, and involve participation of multiple expert surgeons. We sought to obtain valid performance scores of urologic trainees and faculty on a dry-laboratory robotic surgery task module by using crowdsourcing through a web-based grading tool called Crowd Sourced Assessment of Technical Skill (CSATS).
Methods: IRB approval was granted to test the technical skills grading accuracy of Amazon.com Mechanical Turk™ crowd-workers compared to three expert faculty surgeon graders. The two groups assessed dry-laboratory robotic surgical suturing performances of three urology residents (PGY-2, -4, -5) and two faculty using three performance domains from the validated Global Evaluative Assessment of Robotic Skills assessment tool.
Results: After an average of 2 hours 50 minutes, each of the five videos received 50 crowd-worker assessments. The inter-rater reliability (IRR) between the surgeons and crowd was 0.91 using Cronbach's alpha statistic (confidence intervals=0.20-0.92), indicating an agreement level between the two groups of "excellent." The crowds were able to discriminate the surgical level, and both the crowds and the expert faculty surgeon graders scored one senior trainee's performance above a faculty's performance.
Conclusion: Surgery-naive crowd-workers can rapidly assess varying levels of surgical skill accurately relative to a panel of faculty raters. The crowds provided rapid feedback and were inexpensive. CSATS may be a valuable adjunct to surgical simulation training as requirements for more granular and iterative performance tracking of trainees become mandated and commonplace.