PI-RADS Versions 2 and 2.1: Interobserver Agreement and Diagnostic Performance in Peripheral and Transition Zone Lesions Among Six Radiologists

AJR Am J Roentgenol. 2021 Jul;217(1):141-151. doi: 10.2214/AJR.20.24199. Epub 2020 Sep 9.

Abstract

BACKGROUND. PI-RADS version 2.1 (v2.1) modifications primarily address transition zone (TZ) interpretation. The revisions also impact peripheral zone (PZ) interpretation, which has received less attention. OBJECTIVE. The purpose of this study was to compare interobserver agreement of PI-RADS version 2 (v2) and v2.1 in the prostate PZ and TZ and perform a pilot comparison of their diagnostic performance in the two zones. METHODS. Six radiologists with varying experience retrospectively assessed 80 prostate lesions (40 PZ, 40 TZ) on MRI in separate sessions for PI-RADS v2 and v2.1. Interobserver agreement was assessed using Conger kappa (κ). For 50 lesions with pathology data, average AUC for detecting clinically significant cancer was compared between versions using multireader multicase statistical methods. Error variance and covariance results informed post hoc power analysis. RESULTS. Interobserver agreement for PI-RADS category 4 or greater was higher for version 2.1 (κ = 0.64) than version 2 (κ = 0.51) in the PZ, but similar for version 2 (κ = 0.64) and version 2.1 (κ = 0.60) in the TZ. The PI-RADS v2.1 DWI descriptor "linear/wedge-shaped" had higher agreement than its predecessor version 2 descriptor "indistinct hypointense" (κ = 0.52 vs κ = 0.18) and yielded 14 more true-negative versus five more false-negative interpretations. The ADC signal descriptor "markedly hypointense," for which only version 2.1 provides a specific definition, had lower agreement in version 2.1 (κ = 0.26) than version 2 (κ = 0.52). Modified TZ T2-weighted category 2 descriptors in version 2.1 had fair agreement (κ = 0.21), and agreement for PI-RADS category 2 in the TZ was lower in version 2.1 (κ = 0.31) than version 2 (κ = 0.57). DWI upgraded a TZ lesion category from 2 to 3 in four patients, detecting two additional cancers. Average AUC was not different between versions 2 and 2.1 for the PZ (AUC, 0.81 vs 0.85; p = .24) or the TZ (AUC, 0.69 vs 0.69; p = .94), though among experienced readers AUC was higher for version 2.1 than version 2 for the PZ (0.91 vs 0.82; p = .001). Overall performance comparison had sufficient power (0.8) to detect a 0.085 difference in AUC. CONCLUSION. Interobserver agreement improved using PI-RADS v2.1 in the PZ but not the TZ. Diagnostic performance improved using version 2.1 only in the PZ for experienced readers. Specific version 2.1 modifications yielded mixed results. CLINICAL IMPACT. The impact of PI-RADS v2.1 in the PZ is notable given the emphasis on version 2.1 TZ modifications. The findings suggest areas in which additional modification could further improve interobserver agreement and performance.

Keywords: AUC; MRI; prostatic neoplasms; radiologists; reproducibility of results.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Humans
  • Magnetic Resonance Imaging / methods*
  • Male
  • Middle Aged
  • Observer Variation
  • Prostate / diagnostic imaging
  • Prostatic Neoplasms / diagnostic imaging*
  • Radiologists / statistics & numerical data*
  • Radiology Information Systems*
  • Reproducibility of Results
  • Retrospective Studies
  • Sensitivity and Specificity