Purpose: To assess variability of computed tomographic (CT) measurements of lesions of various sizes and margin sharpness in several organs taken by readers with different levels of experience, as would be found in routine clinical practice.
Materials and methods: In this institutional review board-approved, HIPAA-compliant retrospective study, 17 radiologists with varying levels of experience independently obtained bidimensional orthogonal axial measurements of 80 lymph nodes, 120 pulmonary lesions, and 120 hepatic lesions, categorized by size and margin sharpness. Repeat measurements were performed 2 or more weeks later. Intraclass correlation coefficients and Bland-Altman plots were used to assess intra- and interobserver variability.
Results: For long- and short-axis measurements, respectively, overall intraobserver agreement rates were 0.957 (95% confidence interval [CI]: 0.947, 0.966) and 0.945 (95% CI: 0.933, 0.955); interobserver agreement rates were 0.954 (95% CI: 0.943, 0.963) and 0.941 (95% CI: 0.929, 0.951). Both intra- and interobserver agreement differed by lesion size, margin sharpness, location, and reader experience. Agreement ranged from 0.847 to 0.886 for lesions 20 mm or larger versus 0.745-0.785 for lesions smaller than 10 mm, 0.961 to 0.975 for smooth margins versus 0.924-0.942 for irregular margins, 0.955 to 0.97 for lung lesions versus 0.884-0.94 for lymph nodes, and 0.95 to 0.97 for attending radiologists versus 0.928-0.945 for fellows. Measurement variability decreased with increasing lesion size; 95% limits of agreement for short-axis measurements were -11.6% to 6.7% for lesions smaller than 10 mm versus -6.2% to 4.7% for lesions 20 mm or larger.
Conclusion: Overall intra- and interobserver variability rates were similar; in clinical practice, serial CT measurements can be safely performed by different radiologists. Smooth margins, larger lesion size, and greater reader experience resulted in a higher consistency of measurements. Depending on lesion size, increases of 4%-6% or greater in long axis and 5%-7% or greater in short axis and decreases of -6% to -10% or greater in long axis and -6% to -12% or greater in short axis at CT can be considered true changes rather than measurement variation, with 95% confidence.