Interobserver variability in thyroid ultrasound

Endocrine. 2024 Feb 19. doi: 10.1007/s12020-024-03731-5. Online ahead of print.

Abstract

Purpose: Ultrasound evaluation of thyroid nodules is the preferred technique, but it is dependent on operator interpretation, leading to inter-observer variability. The current study aimed to determine the inter-physician consensus on nodular characteristics, risk categorization in the classification systems, and the need for fine needle aspiration puncture.

Methods: Four endocrinologists from the same center blindly evaluated 100 ultrasound images of thyroid nodules from 100 different patients. The following ultrasound features were evaluated: composition, echogenicity, margins, calcifications, and microcalcifications. Nodules were also classified according to ATA, EU-TIRADS, K-TIRADS, and ACR-TIRADS classifications. Krippendorff's alpha test was used to assess interobserver agreement.

Results: The interobserver agreement for ultrasound features was: Krippendorff's coefficient 0.80 (0.71-0.89) for composition, 0.59 (0.47-0.72) for echogenicity, 0.73 (0.57-0.88) for margins, 0.55 (0.40-0.69) for calcifications, and 0.50 (0.34-0.67) for microcalcifications. The concordance for the classification systems was 0.7 (0.61-0.80) for ATA, 0.63 (0.54-0.73) for EU-TIRADS, 0.64 (0.55-0.73) for K-TIRADS, and 0.68 (0.60-0.77) for K-TIRADS. The concordance in the indication of fine needle aspiration puncture (FNA) was 0.86 (0.71-1), 0.80 (0.71-0.88), 0.77 0.67-0.87), and 0.73 (0.64-0.83) for systems previously described respectively.

Conclusions: Interobserver agreement was acceptable for the identification of nodules requiring cytologic study using various classification systems. However, limited concordance was observed in risk stratification and many ultrasonographic characteristics of the nodules.

Keywords: Interobserver agreement; Interobserver variability; Thyroid nodule; Thyroid ultrasound; Ultrasound classification systems.