Performance and Usability of Code-Free Deep Learning for Chest Radiograph Classification, Object Detection, and Segmentation

Radiol Artif Intell. 2023 Feb 15;5(2):e220062. doi: 10.1148/ryai.220062. eCollection 2023 Mar.

Abstract

Purpose: To evaluate the performance and usability of code-free deep learning (CFDL) platforms in creating DL models for disease classification, object detection, and segmentation on chest radiographs.

Materials and methods: Six CFDL platforms were evaluated in this retrospective study (September 2021). Single- and multilabel classifiers were trained for thoracic pathologic conditions using Guangzhou pediatric and NIH-CXR14 (ie, National Institutes of Health ChestX-ray14) datasets, and external testing was performed using subsets of NIH-CXR14 and Stanford CheXpert datasets, respectively. Pneumonia detection and pneumothorax segmentation models were trained using the Radiological Society of North America (RSNA) Pneumonia and Society for Imaging Informatics in Medicine (SIIM) Pneumothorax datasets, respectively. Model performance was evaluated using F1 scores. Usability was evaluated based on feasibility of image uploading and model training, ease of use, and cost.

Results: NIH-CXR14 and CheXpert datasets contained 112 120 (mean age, 47 years ± 17 [SD]; 63 340 male patients) and 151 522 images (mean age, 61 years ± 18; 88 931 male patients), respectively. The other datasets did not report demographics (Guangzhou, 5826 images; RSNA, 26 683 images; SIIM, 15 301 images). Six platforms offered single-label classifiers, four multilabel classifiers, five object detection models, and one segmentation model. Guangzhou pneumonia classifiers demonstrated good internal (F1, 0.93-0.99) and poor external (F1, 0.39-0.44) performance. Multilabel NIH-CXR14 classifiers showed poor internal and external performance (F1, 0.00-0.36 and 0.00-0.76, respectively). NIH-CXR14 single-label classifiers performed poorly (F1, 0.00, all). The single successfully trained pneumonia detection model had an F1 score of 0.48. No segmentation model was successfully trained. Platform usability was limited, with all requiring some type of coded solution.

Conclusion: CFDL platforms demonstrated limited performance and usability for chest radiograph analysis.Keywords: Artificial Intelligence, Automated Machine Learning, Chest Radiographs, Deep Learning, Code-Free Deep Learning, Pneumonia, Pneumothorax, Radiology Supplemental material is available for this article. © RSNA, 2023.

Keywords: Artificial Intelligence; Automated Machine Learning; Chest Radiographs; Code-Free Deep Learning; Deep Learning; Pneumonia; Pneumothorax; Radiology.