Interrater Reliability in Toxicity Identification: Limitations of Current Standards

Int J Radiat Oncol Biol Phys. 2020 Aug 1;107(5):996-1000. doi: 10.1016/j.ijrobp.2020.04.040. Epub 2020 May 3.


Purpose: The National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) v5.0 is the standard for oncology toxicity encoding and grading, despite limited validation. We assessed interrater reliability (IRR) in multireviewer toxicity identification.

Methods and materials: Two reviewers independently reviewed 100 randomly selected notes for weekly on-treatment visits during radiation therapy from the electronic health record. Discrepancies were adjudicated by a third reviewer for consensus. Term harmonization was performed to account for overlapping symptoms in CTCAE. IRR was assessed based on unweighted and weighted Cohen's kappa coefficients.

Results: Between reviewers, the unweighted kappa was 0.68 (95% confidence interval, 0.65-0.71) and the weighted kappa was 0.59 (0.22-1.00). IRR was consistent between symptoms noted as present or absent with a kappa of 0.6 (0.66-0.71) and 0.6 (0.65-0.69), respectively.

Conclusions: Significant discordance suggests toxicity identification, particularly retrospectively, is a complex and error-prone task. Strategies to minimize IRR, including training and simplification of the CTCAE criteria, should be considered in trial design and future terminologies.

MeSH terms

  • Humans
  • National Cancer Institute (U.S.) / standards
  • Neoplasms / radiotherapy*
  • Observer Variation
  • Radiotherapy / adverse effects*
  • Radiotherapy / standards*
  • Reference Standards
  • United States