Purpose: To quantify observer agreement and analyze causes of disagreement in identifying honeycombing at chest computed tomography (CT).
Materials and methods: The institutional review board approved this multiinstitutional HIPAA-compliant retrospective study, and informed patient consent was not required. Five core study members scored 80 CT images with a five-point scale (5 = definitely yes to 1 = definitely no) to establish a reference standard for the identification of honeycombing. Forty-three observers from various subspecialties and geographic regions scored the CT images by using the same scoring system. Weighted κ values of honeycombing scores compared with the reference standard were analyzed to investigate intergroup differences. Images were divided into four groups to allow analysis of imaging features of cases in which there was disagreement: agreement on the presence of honeycombing, agreement on the absence of honeycombing, disagreement on the presence of honeycombing, and other (none of the preceding three groups applied).
Results: Agreement of scores of honeycombing presence by 43 observers with the reference standard was moderate (Cohen weighted κ values: 0.40-0.58). There were no significant differences in κ values among groups defined by either subspecialty or geographic region (Tukey-Kramer test, P = .38 to >.99). In 29% of cases, there was disagreement on identification of honeycombing. These cases included honeycombing mixed with traction bronchiectasis, large cysts, and superimposed pulmonary emphysema.
Conclusion: Identification of honeycombing at CT is subjective, and disagreement is largely caused by conditions that mimic honeycombing.