Purpose: For peer review of teaching to be credible and reliable, peer raters must be trained to identify and measure teaching behaviors accurately. Peer rater training, therefore, must be based on expert-derived rating standards of teaching performance. The authors sought to establish precise lecture rating standards for use in peer rater training at their school.
Method: From 2008 to 2010, a panel of experts, who had previously helped to develop an instrument for the peer assessment of lecturing, met to observe, discuss, and rate 40 lectures, using a consensus-building model to determine key behaviors and levels of proficiency for each of the instrument's 11 criteria. During this process, the panelists supplemented the original instrument with precise behavioral descriptors of lecturing. The reliability of the derived rating standards was assessed by having the panelists score six sample lectures independently.
Results: Intraclass correlation coefficients of the panelists' ratings of the lectures ranged from 0.75 to 0.96. There was moderate to high positive association between 10 of the 11 instrument's criteria and the overall performance score (r = 0.752-0.886). There were no statistically significant differences among raters in terms of leniency or stringency of scores.
Conclusions: Two relational themes, content and style, were identified within the instrument's variables. Recommendations for developing expert-derived ratings standards include using an interdisciplinary group for observation, discussion, and verbal identification of behaviors; asking members to consider views that contrast with their own; and noting key teaching behaviors for use in future peer rater training.