Inter-rater variability and repeatability in the assessment of the Tanner-Whitehouse classification of hand radiographs for the estimation of bone age

Skeletal Radiol. 2024 May 2. doi: 10.1007/s00256-024-04664-w. Online ahead of print.

Abstract

Objective: To determine which bones and which grades had the highest inter-rater variability when employing the Tanner-Whitehouse (T-W) method.

Materials and methods: Twenty-four radiologists were recruited and trained in the T-W classification of skeletal development. The consistency and skill of the radiologists in determining bone development status were assessed using 20 pediatric hand radiographs of children aged 1 to 18 years old. Four radiologists had a poor concordance rate and were excluded. The remaining 20 radiologists undertook a repeat reading of the radiographs, and their results were analyzed by comparing them with the mean assessment of two senior experts as the reference standard. Concordance rate, scoring, and Kendall's W were calculated to evaluate accuracy and consistency.

Results: Both the radius, ulna, and short finger (RUS) system (Kendall's W = 0.833) and the carpal (C) system (Kendall's W = 0.944) had excellent consistency, with the RUS system outperforming the C system in terms of scores. The repeatability analysis showed that the second rating test, performed after 2 months of further bone age assessment (BAA) practice, was more consistent and accurate than the first. The capitate had the lowest average concordance rate and scoring, as well as the lowest overall concordance rate for its D classification. Moreover, the G classifications of the seven carpal bones all had a concordance rate less than 0.6. The bones with lower Kendall's W were likewise those with lower scores and concordance rates.

Conclusion: The D grade of the capitate showed the highest variation, and the use of the Tanner-Whitehouse 3rd edition (T-W3) to determine bone age (BA) was frequently inconsistent. A more comprehensive description with a focus on inaccuracy bones or ratings and a modification to the T-W3 approach would significantly advance BAA.

Keywords: Bone age; Child; Hand; Inter-rater variation; Radiograph; Tanner–Whitehouse (T-W).