The reliability of operative rating tool evaluations: How late is too late to provide operative performance feedback?

Am J Surg. 2018 Dec;216(6):1052-1055. doi: 10.1016/j.amjsurg.2018.04.005. Epub 2018 Apr 17.


Background: Operative rating tools can enhance performance assessment in surgical training. However, assessments completed late may have questionable reliability. We evaluated the reliability of assessments according to evaluation time-to-completion.

Methods: We stratified assessments from MileMarker's™ Operative Entrustability Assessment by evaluation time-to-completion, using concordance correlation coefficient (CCC) between self-assessment and evaluator scores as a measure of reliability.

Results: Overall, self-assessment and evaluator scores were strongly correlated (CCC = 0.72; p < 0.001) though self-assessments were slightly higher (p = 0.048). Reliability remained stable for evaluations completed within 0 days (CCC = 0.77; p < 0.001), 1-3 days (CCC = 0.73; p < 0.001), and 4-13 days after surgery (CCC = 0.69; p < 0.001), but dropped for evaluations completed within 14-38 days (CCC = 0.60; p < 0.001) and over 38 days (CCC = 0.54; p < 0.001) after surgery. There was strong evidence for an interaction between time-to-completion and reliability (p < 0.001).

Conclusions: Our data support the reliability of assessments completed until 2 weeks after surgery. This finding may help refine the interpretation of evaluation scores as surgical specialties move toward competency-based accreditation.

Keywords: Graduate medical education; Operative skills; Resident evaluation; Resident performance; Surgical education.

MeSH terms

  • Clinical Competence*
  • Education, Medical, Graduate*
  • General Surgery / education*
  • Humans
  • Knowledge of Results, Psychological*
  • Reproducibility of Results
  • Retrospective Studies
  • Self-Assessment
  • Time Factors