Background: Medical practitioners' reflective skills are increasingly considered important and therefore included in the medical education curriculum. However, assessing students' reflective skills using rubrics does not appear to guarantee adequate inter-rater reliabilities. Recently, comparative judgment was introduced as a new method to evaluate performance assessments. This study investigates the merits and limitations of the comparative judgment method for assessing students' written self-reflections. More specifically, it examines the reliability in relation to the time spent assessing, the correlation between the scores obtained using the two methods (rubrics and comparative judgment), and, raters' perceptions of the comparative judgment method. Approach: Twenty-two self-reflections, that had previously been scored using a rubric, were assessed by a group of eight raters using comparative judgment. Two hundred comparisons were completed and a rank order was calculated. Raters' impressions were investigated using a focus group.
Findings: Using comparative judgment, each self-reflection needed to be compared seven times with another self-reflection to reach a scale separation reliability of .55. The inter-rater reliability of rating (ICC, (1, k)) using rubrics was .56. The time investment required for these reliability levels in both methods was around 24 minutes. The Kendall's tau rank correlation indicated a strong correlation between the scores obtained via both methods. Raters reported that making comparisons made them evaluate the quality of self-reflections in a more nuanced way. Time investment was, however, considered heavy, especially for the first comparisons. Although raters appreciated that they did not have to assign a grade to each self-reflection, the fact that the method does not automatically lead to a grade or feedback was considered a downside.
Conclusions: First evidence was provided for the comparative judgment method as an alternative to using rubrics for assessing students' written self-reflections. Before comparative judgment can be implemented for summative assessment, more research is needed on the time investment required to ensure no contradictory feedback is given back to students. Moreover, as the comparative judgment method requires an additional standard setting exercise to obtain grades, more research is warranted on the merits and limitations of this method when a pass/fail approach is used.
Keywords: comparative judgment; reflective skills; reliability; rubrics.