Testing the raters: inter-rater reliability of standardized anaesthesia simulator performance

Can J Anaesth. 1997 Sep;44(9):924-8. doi: 10.1007/BF03011962.


Purpose: Assessment of physician performance has been a subjective process. An anaesthesia simulator could be used for a more structured and standardized evaluation but its reliability for this purpose is not known. We sought to determine if observers witnessing the same event in an anaesthesia simulator would agree on their rating of anaesthetist performance.

Methods: The study had the approval of the research ethics board. Two one-hour clinical scenarios were developed, each containing five anaesthetic problems. For each problem, a rating scale defined the appropriate score (no response to the situation: score = 0; compensating intervention defined as physiological correction: score = 1; corrective treatment: defined as definitive therapy score = 2). Video tape recordings, for assessment of inter-rater reliability, were generated through role-playing with recording of the two scenarios three times each resulting in a total of 30 events to be evaluated. Two clinical anaesthetists, uninvolved in the development of the study and the clinical scenarios, reviewed and scored each of the 30 problems independently. The scores produced by the two observers were compared using the kappa statistic of agreement.

Results: The raters were in complete agreement on 29 of the 30 items. There was excellent inter-rater reliability (= 0.96, P < 0.001).

Conclusions: The use of videotapes allowed the scenarios to be scored by reproducing the same event for each observer. There was excellent inter-rater agreement within the confines of the study. Rating of video recordings of anaesthetist performance in a simulation setting can be used for scoring of performance. The validity of the scenarios and the scoring system for assessing clinician performance have yet to be determined.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Anesthesia, General
  • Anesthesiology / education*
  • Anesthesiology / standards
  • Clinical Competence
  • Computer Simulation / standards*
  • Decision Making
  • Educational Measurement / standards*
  • Evaluation Studies as Topic
  • Humans
  • Monitoring, Intraoperative
  • Observer Variation
  • Problem Solving
  • Reproducibility of Results
  • Role Playing
  • Thinking
  • Videotape Recording