Artificial Intelligence in Chest Radiography Reporting Accuracy: Added Clinical Value in the Emergency Unit Setting Without 24/7 Radiology Coverage

Invest Radiol. 2022 Feb 1;57(2):90-98. doi: 10.1097/RLI.0000000000000813.

Abstract

Objectives: Chest radiographs (CXRs) are commonly performed in emergency units (EUs), but the interpretation requires radiology experience. We developed an artificial intelligence (AI) system (precommercial) that aims to mimic board-certified radiologists' (BCRs') performance and can therefore support non-radiology residents (NRRs) in clinical settings lacking 24/7 radiology coverage. We validated by quantifying the clinical value of our AI system for radiology residents (RRs) and EU-experienced NRRs in a clinically representative EU setting.

Materials and methods: A total of 563 EU CXRs were retrospectively assessed by 3 BCRs, 3 RRs, and 3 EU-experienced NRRs. Suspected pathologies (pleural effusion, pneumothorax, consolidations suspicious for pneumonia, lung lesions) were reported on a 5-step confidence scale (sum of 20,268 reported pathology suspicions [563 images × 9 readers × 4 pathologies]) separately by every involved reader. Board-certified radiologists' confidence scores were converted into 4 binary reference standards (RFSs) of different sensitivities. The RRs' and NRRs' performances were statistically compared with our AI system (trained on nonpublic data from different clinical sites) based on receiver operating characteristics (ROCs) and operating point metrics approximated to the maximum sum of sensitivity and specificity (Youden statistics).

Results: The NRRs lose diagnostic accuracy to RRs with increasingly sensitive BCRs' RFSs for all considered pathologies. Based on our external validation data set, the AI system/NRRs' consensus mimicked the most sensitive BCRs' RFSs with areas under ROC of 0.940/0.837 (pneumothorax), 0.953/0.823 (pleural effusion), and 0.883/0.747 (lung lesions), which were comparable to experienced RRs and significantly overcomes EU-experienced NRRs' diagnostic performance. For consolidation detection, the AI system performed on the NRRs' consensus level (and overcomes each individual NRR) with an area under ROC of 0.847 referenced to the BCRs' most sensitive RFS.

Conclusions: Our AI system matched RRs' performance, meanwhile significantly outperformed NRRs' diagnostic accuracy for most of considered CXR pathologies (pneumothorax, pleural effusion, and lung lesions) and therefore might serve as clinical decision support for NRRs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Emergency Service, Hospital
  • Humans
  • Lung Diseases*
  • Pleural Effusion* / diagnostic imaging
  • Pneumothorax* / diagnostic imaging
  • Radiography
  • Radiography, Thoracic / methods
  • Radiology*
  • Retrospective Studies