Interrater agreement in the evaluation of discrepant imaging findings with the Radpeer system

AJR Am J Roentgenol. 2012 Dec;199(6):1320-7. doi: 10.2214/AJR.12.8972.


Objective: The Radpeer system is central to the quality assurance process in many radiology practices. Previous studies have shown poor agreement between physicians in the evaluation of their peers. The purpose of this study was to assess the reliability of the Radpeer scoring system.

Materials and methods: A sample of 25 discrepant cases was extracted from our quality assurance database. Images were made anonymous; associated reports and identities of interpreting radiologists were removed. Indications for the studies and descriptions of the discrepancies were provided. Twenty-one subspecialist attending radiologists rated the cases using the Radpeer scoring system. Multirater kappa statistics were used to assess interrater agreement, both with the standard scoring system and with dichotomized scores to reflect the practice of further review for cases rated 3 and 4. Subgroup analyses were conducted to assess subspecialist evaluation of cases.

Results: Interrater agreement was slight to fair compared with that expected by chance. For the group of 21 raters, the kappa values were 0.11 (95% CI, 0.06-0.16) with the standard scoring system and 0.20 (95% CI, 0.13-0.27) with dichotomized scores. There was disagreement about whether a discrepancy had occurred in 20 cases. Subgroup analyses did not reveal significant differences in the degree of interrater agreement.

Conclusion: The identification of discrepant interpretations is valuable for the education of individual radiologists and for larger-scale quality assurance and quality improvement efforts. Our results show that a ratings-based peer review system is unreliable and subjective for the evaluation of discrepant interpretations. Resources should be devoted to developing more robust and objective assessment procedures, particularly those with clear quality improvement goals.

MeSH terms

  • Clinical Competence*
  • Diagnostic Errors / statistics & numerical data
  • Diagnostic Imaging*
  • Humans
  • Peer Review, Health Care*
  • Quality Assurance, Health Care*
  • Radiology / standards*
  • Reproducibility of Results