Development and validation of a deep learning-based model to distinguish acetabular fractures on pelvic anteroposterior radiographs

Pengyu Ye; Sihe Li; Zhongzheng Wang; Siyu Tian; Yi Luo; Zhanyong Wu; Yan Zhuang; Yingze Zhang; Marcin Grzegorzek; Zhiyong Hou

doi:10.3389/fphys.2023.1146910

Development and validation of a deep learning-based model to distinguish acetabular fractures on pelvic anteroposterior radiographs

Front Physiol. 2023 Apr 28:14:1146910. doi: 10.3389/fphys.2023.1146910. eCollection 2023.

Authors

Pengyu Ye¹, Sihe Li², Zhongzheng Wang¹, Siyu Tian¹, Yi Luo³, Zhanyong Wu⁴, Yan Zhuang⁵, Yingze Zhang¹, Marcin Grzegorzek², Zhiyong Hou¹

Affiliations

¹ Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, China.
² University of Lübeck, Lübeck, Schleswig-Holstein, Germany.
³ Heidelberg University, Heidelberg, Baden-Württemberg, Germany.
⁴ Orthopedic Hospital of Xingtai, Xingtai, China.
⁵ Xi'an Honghui Hospital, Xi'an, Shaanxi, China.

Abstract

Objective: To develop and test a deep learning (DL) model to distinguish acetabular fractures (AFs) on pelvic anteroposterior radiographs (PARs) and compare its performance to that of clinicians. Materials and methods: A total of 1,120 patients from a big level-I trauma center were enrolled and allocated at a 3:1 ratio for the DL model's development and internal test. Another 86 patients from two independent hospitals were collected for external validation. A DL model for identifying AFs was constructed based on DenseNet. AFs were classified into types A, B, and C according to the three-column classification theory. Ten clinicians were recruited for AF detection. A potential misdiagnosed case (PMC) was defined based on clinicians' detection results. The detection performance of the clinicians and DL model were evaluated and compared. The detection performance of different subtypes using DL was assessed using the area under the receiver operating characteristic curve (AUC). Results: The means of 10 clinicians' sensitivity, specificity, and accuracy to identify AFs were 0.750/0.735, 0.909/0.909, and 0.829/0.822, in the internal test/external validation set, respectively. The sensitivity, specificity, and accuracy of the DL detection model were 0.926/0.872, 0.978/0.988, and 0.952/0.930, respectively. The DL model identified type A fractures with an AUC of 0.963 [95% confidence interval (CI): 0.927-0.985]/0.950 (95% CI: 0.867-0.989); type B fractures with an AUC of 0.991 (95% CI: 0.967-0.999)/0.989 (95% CI: 0.930-1.000); and type C fractures with an AUC of 1.000 (95% CI: 0.975-1.000)/1.000 (95% CI: 0.897-1.000) in the test/validation set. The DL model correctly recognized 56.5% (26/46) of PMCs. Conclusion: A DL model for distinguishing AFs on PARs is feasible. In this study, the DL model achieved a diagnostic performance comparable to or even superior to that of clinicians.

Keywords: DenseNet; acetabular fracture; deep learning; diagnosis; pelvic anteroposterior radiograph.

Grants and funding

This study has received funding by the National Natural Science Foundation of China (Grant No. 82072523).