Background: One of the most important weaknesses of the peer review process is that different reviewers' ratings of the same grant proposal typically differ. Studies on the inter-rater reliability of peer reviews mostly report only average values across all submitted proposals. But inter-rater reliabilities can vary depending on the scientific discipline or the requested grant sum, for instance.
Goal: Taking the Austrian Science Fund (FWF) as an example, we aimed to investigate empirically the heterogeneity of inter-rater reliabilities (intraclass correlation) and its determinants.
Methods: The data consisted of N = 8,329 proposals with N = 23,414 overall ratings by reviewers, which were statistically analyzed using the generalized estimating equations approach (GEE).
Results: We found an overall intraclass correlation (ICC) of reviewer? ratings of ρ = .259 with a 95% confidence interval of [.249,.279]. In humanities the ICCs were statistically significantly higher than in all other research areas except technical sciences. The ICC in biosciences deviated statistically significantly from the average ICC. Other factors (besides the research areas), such as the grant sum requested, had negligible influence on the ICC.
Conclusions: Especially in biosciences, the number of reviewers of each proposal should be increased so as to increase the ICC.