Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings

Eur Radiol. 2020 Mar;30(3):1359-1368. doi: 10.1007/s00330-019-06532-x. Epub 2019 Nov 20.

Abstract

Objective: To investigate the feasibility of a deep learning-based detection (DLD) system for multiclass lesions on chest radiograph, in comparison with observers.

Methods: A total of 15,809 chest radiographs were collected from two tertiary hospitals (7204 normal and 8605 abnormal with nodule/mass, interstitial opacity, pleural effusion, or pneumothorax). Except for the test set (100 normal and 100 abnormal (nodule/mass, 70; interstitial opacity, 10; pleural effusion, 10; pneumothorax, 10)), radiographs were used to develop a DLD system for detecting multiclass lesions. The diagnostic performance of the developed model and that of nine observers with varying experiences were evaluated and compared using area under the receiver operating characteristic curve (AUROC), on a per-image basis, and jackknife alternative free-response receiver operating characteristic figure of merit (FOM) on a per-lesion basis. The false-positive fraction was also calculated.

Results: Compared with the group-averaged observations, the DLD system demonstrated significantly higher performances on image-wise normal/abnormal classification and lesion-wise detection with pattern classification (AUROC, 0.985 vs. 0.958; p = 0.001; FOM, 0.962 vs. 0.886; p < 0.001). In lesion-wise detection, the DLD system outperformed all nine observers. In the subgroup analysis, the DLD system exhibited consistently better performance for both nodule/mass (FOM, 0.913 vs. 0.847; p < 0.001) and the other three abnormal classes (FOM, 0.995 vs. 0.843; p < 0.001). The false-positive fraction of all abnormalities was 0.11 for the DLD system and 0.19 for the observers.

Conclusions: The DLD system showed the potential for detection of lesions and pattern classification on chest radiographs, performing normal/abnormal classifications and achieving high diagnostic performance.

Key points: • The DLD system was feasible for detection with pattern classification of multiclass lesions on chest radiograph. • The DLD system had high performance of image-wise classification as normal or abnormal chest radiographs (AUROC, 0.985) and showed especially high specificity (99.0%). • In lesion-wise detection of multiclass lesions, the DLD system outperformed all 9 observers (FOM, 0.962 vs. 0.886; p < 0.001).

Keywords: Automated pattern recognition; Classification; Deep learning; Thoracic radiography.

MeSH terms

  • Adult
  • Aged
  • Area Under Curve
  • Deep Learning*
  • Female
  • Humans
  • Lung Diseases / diagnostic imaging*
  • Lung Diseases, Interstitial / diagnostic imaging
  • Lung Neoplasms / diagnostic imaging
  • Male
  • Middle Aged
  • Pleural Diseases / diagnostic imaging*
  • Pleural Effusion / diagnostic imaging
  • Pneumothorax / diagnostic imaging
  • ROC Curve
  • Radiography
  • Radiography, Thoracic / methods*
  • Sensitivity and Specificity
  • Solitary Pulmonary Nodule / diagnostic imaging