Diagnostic performance of a deep learning-based method in differentiating malignant from benign subcentimeter (≤10 mm) solid pulmonary nodules

Jianing Liu; Linlin Qi; Yawen Wang; Fenglan Li; Jiaqi Chen; Sainan Cheng; Zhen Zhou; Yizhou Yu; Jianwei Wang

doi:10.21037/jtd-23-985

Diagnostic performance of a deep learning-based method in differentiating malignant from benign subcentimeter (≤10 mm) solid pulmonary nodules

J Thorac Dis. 2023 Oct 31;15(10):5475-5484. doi: 10.21037/jtd-23-985. Epub 2023 Sep 19.

Authors

Jianing Liu¹, Linlin Qi¹, Yawen Wang¹, Fenglan Li¹, Jiaqi Chen¹, Sainan Cheng¹, Zhen Zhou², Yizhou Yu³, Jianwei Wang¹

Affiliations

¹ Department of Diagnostic Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
² Beijing Deepwise & League of PhD Technology Co., Ltd., Beijing, China.
³ Department of Computer Science, The University of Hong Kong, Hong Kong, China.

Abstract

Background: This study assessed the diagnostic performance of a deep learning (DL)-based model for differentiating malignant subcentimeter (≤10 mm) solid pulmonary nodules (SSPNs) from benign ones in computed tomography (CT) images compared against radiologists with 10 and 15 years of experience in thoracic imaging (medium-senior seniority).

Methods: Overall, 200 SSPNs (100 benign and 100 malignant) were retrospectively collected. Malignancy was confirmed by pathology, and benignity was confirmed by follow-up or pathology. CT images were fed into the DL model to obtain the probability of malignancy (range, 0-100%) for each nodule. According to the diagnostic results, enrolled nodules were classified into benign, malignant, or indeterminate. The accuracy and diagnostic composition of the model were compared with those of the radiologists using the McNemar-Bowker test. Enrolled nodules were divided into 3-6-, 6-8-, and 8-10-mm subgroups. For each subgroup, the diagnostic results of the model were compared with those of the radiologists.

Results: The accuracy of the DL model, in differentiating malignant and benign SSPNs, was significantly higher than that of the radiologists (71.5% vs. 38.5%, P<0.001). The DL model reported more benign or malignant deterministic results and fewer indeterminate results. In subgroup analysis of nodule size, the DL model also yielded higher performance in comparison with that of the radiologists, providing fewer indeterminate results. The accuracy of the two methods in the 3-6-, 6-8-, and 8-10-mm subgroups was 75.5% vs. 28.3% (P<0.001), 62.0% vs. 28.2% (P<0.001), and 77.6% vs. 55.3% (P=0.001), respectively, and the indeterminate results were 3.8% vs. 66.0%, 8.5% vs. 66.2%, and 2.6% vs. 35.5% (all P<0.001), respectively.

Conclusions: The DL-based method yielded higher performance in comparison with that of the radiologists in differentiating malignant and benign SSPNs. This DL model may reduce uncertainty in diagnosis and improve diagnostic accuracy, especially for SSPNs smaller than 8 mm.

Keywords: Computed tomography (CT); artificial intelligence (AI); deep learning (DL); differential diagnosis; solitary pulmonary nodule.