Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers

J Clin Epidemiol. 2013 Sep;66(9):982-93. doi: 10.1016/j.jclinepi.2013.03.003. Epub 2013 May 16.


Objectives: To assess inter-rater reliability and validity of the Newcastle Ottawa Scale (NOS) used for methodological quality assessment of cohort studies included in systematic reviews.

Study design and setting: Two reviewers independently applied the NOS to 131 cohort studies included in eight meta-analyses. Inter-rater reliability was calculated using kappa (κ) statistics. To assess validity, within each meta-analysis, we generated a ratio of pooled estimates for each quality domain. Using a random-effects model, the ratios of odds ratios for each meta-analysis were combined to give an overall estimate of differences in effect estimates.

Results: Inter-rater reliability varied from substantial for length of follow-up (κ = 0.68, 95% confidence interval [CI] = 0.47, 0.89) to poor for selection of the nonexposed cohort and demonstration that the outcome was not present at the outset of the study (κ = -0.03, 95% CI = -0.06, 0.00; κ = -0.06, 95% CI = -0.20, 0.07). Reliability for overall score was fair (κ = 0.29, 95% CI = 0.10, 0.47). In general, reviewers found the tool difficult to use and the decision rules vague even with additional information provided as part of this study. We found no association between individual items or overall score and effect estimates.

Conclusion: Variable agreement and lack of evidence that the NOS can identify studies with biased results underscore the need for revisions and more detailed guidance for systematic reviewers using the NOS.

Keywords: Cohort studies; Internal validity; Methodological quality; Reliability; Systematic reviews; Validity.

MeSH terms

  • Cohort Studies
  • Humans
  • Meta-Analysis as Topic*
  • Observer Variation*
  • Reproducibility of Results
  • Research Design
  • Review Literature as Topic*