Interobserver agreement and efficacy of consensus reading in Kwak-, EU-, and ACR-thyroid imaging recording and data systems and ATA guidelines for the ultrasound risk stratification of thyroid nodules

Endocrine. 2020 Jan;67(1):143-154. doi: 10.1007/s12020-019-02134-1. Epub 2019 Nov 18.

Abstract

Purpose: To investigate the interobserver agreement (IA) and the impact of consensus reading using four risk stratification systems for thyroid nodules (TN).

Methods: Four experienced specialists independently rated US images of 80 TN according to the Kwak-TIRADS, EU-TIRADS, ACR TI-RADS, and ATA Guidelines. The cases were randomly extracted from a prospectively acquired database (n > 1500 TN). The observers were blinded to clinical data. This study was divided into two sessions (S1 and S2) with 40 image sets each. After every session, a consensus reading was carried out (C1, C2). Subsequently, the effect of C1 was tested in S2 with 40 new cases followed by C2. Fleiss' kappa (κ) was calculated for S1 and S2 to estimate the IA and learning curves. The results of C1 and C2 were used as reference for diagnostic accuracy calculations.

Results: IA significantly increased (p < 0.01) after C1 with κ values of 0.375 (0.615), 0.411 (0.596), 0.321 (0.569), and 0.410 (0.583) for the Kwak-TIRADS, EU-TIRADS, ACR TI-RADS, and ATA Guidelines in S1 (S2), respectively. ROC analysis (C1 + C2) revealed similar areas under the curve (AUC) for the Kwak-TIRADS, EU-TIRADS, ACR TI-RADS, and ATA Guidelines (0.635, 0.675, 0.694, and 0.654, respectively, n.s.). AUC did not increase from C1 (0.677 ± 0.010) to C2 (0.632 ± 0.052, n.s.). ATA Guidelines were not applicable in five cases.

Conclusions: IA and diagnostic accuracy were very similar for the four investigated risk stratification systems. Consensus reading sessions significantly improved the IA but did not affect the diagnostic accuracy.

Keywords: Consensus; Head and neck neoplasms; Interobserver agreement; TI-RADS; Thyroid nodule; Ultrasonography.

MeSH terms

  • Consensus
  • Data Systems
  • Humans
  • Observer Variation
  • Reading
  • Retrospective Studies
  • Risk Assessment
  • Thyroid Nodule* / diagnostic imaging
  • Ultrasonography