Purpose: To evaluate agreement among radiologists on the interpretation of pulmonary findings at low-dose computed tomographic (CT) screening examinations for lung cancer.
Materials and methods: Institutional review board approval and informed consent were obtained. HIPAA guidelines were followed. Sixteen radiologists from the 10 National Lung Screening Trial screening centers of the National Cancer Institute's Lung Screening Study network reviewed image subsets from 135 baseline low-dose screening CT examinations in 135 trial participants (89 men, 46 women; mean age, 62.7 years +/- 5.4 [standard deviation]). Interpretations were classified into one of four of the following categories: noncalcified nodule 4 mm or larger in greatest transverse dimension (positive screening result); noncalcified nodule smaller than 4 mm in greatest transverse dimension (negative screening result); calcified, benign nodule (negative screening result); or no nodule (negative screening result). A recommendation for follow-up evaluation was obtained for each case. Interobserver agreement was evaluated by using the multirater kappa statistic and by using response frequencies and descriptive statistics.
Results: Multirater kappa values ranged from 0.58 (for agreement among all four classifications; 95% confidence interval: 0.55, 0.61) to 0.64 (for agreement on classification as a positive or negative screening result; 95% confidence interval: 0.62, 0.65). The average percentage of reader pairs in agreement on the screening result per case (percentage agreement) was 82%. There was wide variation in the total number of abnormalities detected and classified as pulmonary nodules, with differences of up to more than twofold among radiologists. For cases classified as positive, multirater kappa for follow-up recommendations was 0.35.
Conclusion: Interobserver agreement was moderate to substantial; potential for considerable improvement exists. Clinical trial registration no. NCT00047385.