Context: Quality of reviewers is crucial to journal quality, but there are usually too many for editors to know them all personally. A reliable method of rating them (for education and monitoring) is needed.
Objective: Whether editors' quality ratings of peer reviewers are reliable and how they compare with other performance measures.
Design: A 3.5-year prospective observational study.
Setting: Peer-reviewed journal.
Participants: All editors and peer reviewers who reviewed at least 3 manuscripts.
Main outcome measures: Reviewer quality ratings, individual reviewer rate of recommendation for acceptance, congruence between reviewer recommendation and editorial decision (decision congruence), and accuracy in reporting flaws in a masked test manuscript.
Interventions: Editors rated the quality of each review on a subjective 1 to 5 scale.
Results: A total of 4161 reviews of 973 manuscripts by 395 reviewers were studied. The within-reviewer intraclass correlation was 0.44 (P<.001), indicating that 20% of the variance seen in the review ratings was attributable to the reviewer. Intraclass correlations for editor and manuscript were only 0.24 and 0.12, respectively. Reviewer average quality ratings correlated poorly with the rate of recommendation for acceptance (R=-0.34) and congruence with editorial decision (R=0.26). Among 124 reviewers of the fictitious manuscript, the mean quality rating for each reviewer was modestly correlated with the number of flaws they reported (R=0.53). Highly rated reviewers reported twice as many flaws as poorly rated reviewers.
Conclusions: Subjective editor ratings of individual reviewers were moderately reliable and correlated with reviewer ability to report manuscript flaws. Individual reviewer rate of recommendation for acceptance and decision congruence might be thought to be markers of a discriminating (ie, high-quality) reviewer, but these variables were poorly correlated with editors' ratings of review quality or the reviewer's ability to detect flaws in a fictitious manuscript. Therefore, they cannot be substituted for actual quality ratings by editors.