Purpose: To characterize the variability in radiologists' interpretations of computed tomography (CT) studies in the National Lung Screening Trial (NLST) (including assessment of false-positive rates [FPRs] and sensitivity), to examine factors that contribute to variability, and to evaluate trade-offs between FPRs and sensitivity among different groups of radiologists.
Materials and methods: The HIPAA-compliant NLST was approved by the institutional review board at each screening center; all participants provided informed consent. NLST radiologists reported overall screening results, nodule-specific findings, and recommendations for diagnostic follow-up. A noncalcified nodule of 4 mm or larger constituted a positive screening result. The FPR was defined as the rate of positive screening examinations in participants without a cancer diagnosis within 1 year. Descriptive analyses and mixed-effects models were utilized. The average odds ratio (OR) for a false-positive result across all pairs of radiologists was used as a measure of variability.
Results: One hundred twelve radiologists at 32 screening centers each interpreted 100 or more NLST CT studies, interpreting 72 160 of 75 126 total NLST CT studies in aggregate. The mean FPR for radiologists was 28.7% ± 13.7 (standard deviation), with a range of 3.8%-69.0%. The model yielded an average OR of 2.49 across all pairs of radiologists and an OR of 1.83 for pairs within the same screening center. Mean FPRs were similar for academic versus nonacademic centers (27.9% and 26.7%, respectively) and for centers inside (25.0%) versus outside (28.7%) the U.S. "histoplasmosis belt." Aggregate sensitivity was 96.5% for radiologists with FPRs higher than the median (27.1%), compared with 91.9% for those with FPRs lower than the median (P = .02).
Conclusion: There was substantial variability in radiologists' FPRs. Higher FPRs were associated with modestly higher sensitivity.