Objective: To measure the reliability and preliminary validity of a grading instrument for editors to evaluate the quality of peer reviews.
Design: The consecutive sample design included 53 reviews of 23 manuscripts. Reviews were systematically assigned to interrater reliability (n = 41; power greater than 0.90 to detect a difference of greater than one point) and preliminary criterion-related validity (n = 12) subsamples. Content validity was closely examined.
Participants: Three graders evaluated reliability. One individual examined content validity and two editors tested preliminary criterion-related validity. INTERVENTION (INSTRUMENT)--Attributes reflecting two basic dimensions, review content and format, were identified and scored (values are possible points/percent contribution): timeliness, 3/21%; grade sheet, 1/7%; etiquette, 1/7%; sectional narratives, 3/21%; citations, 2/14%; narrative summary, 2/14%; and insights, 2/14%. A scoring guide was provided.
Main outcome measures: Statistical analyses used to test the interrater reliability of the total score included the intraclass correlation coefficient and analysis of variance with the expectation to uphold the null hypothesis. Kendall's coefficient of concordance was used to test preliminary criterion-related validity.
Results: The intraclass correlation coefficient was .84 (P < .001) and a lack of difference between mean scores was demonstrated by analysis of variance (P = .46). Content validity was confirmed and preliminary criterion-related validity was indicated (Kendall's coefficient of concordance = .94, P = .038).
Conclusions: The instrument is reliable. Content validation has been completed, and further criterion-related validation is warranted.