Independent Head-to-Head Comparison of Commercial Artificial Intelligence Devices for Lung Cancer Detection on Chest Radiographs

Radiology. 2026 May;319(2):e252205. doi: 10.1148/radiol.252205.

Abstract

Background Various commercial artificial intelligence (AI) devices can identify lung cancer features on chest radiographs. Understanding their relative performance within the same patient population and health care setting is essential for making informed clinical deployment decisions. Purpose To determine and compare the diagnostic accuracy of multiple commercial AI devices for lung cancer detection on chest radiographs in a patient sample demonstrating a representative prevalence. Materials and Methods Consecutive chest radiographs requested from primary care for any indication in adult patients acquired at a single United Kingdom center between July 2020 and February 2021 were eligible for inclusion. Each radiograph was independently analyzed by each device. The multidisciplinary team decision served as the reference standard for diagnosis. Receiver operating characteristic analysis was performed for continuous scores, with the DeLong test used to compare the area under the receiver operating characteristic curve for the devices. Contingency tables of device classification results were used to calculate diagnostic accuracy metrics. The Cochran Q and McNemar tests were used to compare proportions of classification results between the devices. Fleiss κ was used to assess the agreement of classification results across the devices. Results A total of 5235 radiographs were obtained from 5235 patients (median age, 60 years; 53.4% female, 79.4% White, and 1.4% diagnosed with lung cancer with a visible tumor). Devices from seven manufacturers were tested. The area under the receiver operating characteristic curve varied (0.80-0.94; P < .05 for nine of 15 pairwise comparisons). The sensitivity (20.8%-77.8%), specificity (58.9%-98.4%), and positive predictive value (1.5%-28.4%) varied (P < .001 and P < .05 for 39 of 44 pairwise comparisons). The number of additional false-positive results for tumor detection compared with radiologist reporting ranged from 10 to 2039. Device classification results showed minimal agreement (κ = 0.24). Conclusion There was clinically and statistically significant variability in the diagnostic accuracy of commercial AI devices for lung cancer detection at chest radiography. © RSNA, 2026 Supplemental material is available for this article. See also the editorial by Schaefer-Prokop and Schalekamp in this issue.

Publication types

  • Comparative Study

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Artificial Intelligence*
  • Female
  • Humans
  • Lung / diagnostic imaging
  • Lung Neoplasms* / diagnostic imaging
  • Male
  • Middle Aged
  • Radiography, Thoracic* / methods
  • Retrospective Studies
  • Sensitivity and Specificity
  • United Kingdom