Vision Transformer for femur fracture classification

Leonardo Tanzi; Andrea Audisio; Giansalvo Cirrincione; Alessandro Aprato; Enrico Vezzetti

doi:10.1016/j.injury.2022.04.013

Vision Transformer for femur fracture classification

Injury. 2022 Jul;53(7):2625-2634. doi: 10.1016/j.injury.2022.04.013. Epub 2022 Apr 19.

Authors

Leonardo Tanzi¹, Andrea Audisio², Giansalvo Cirrincione³, Alessandro Aprato², Enrico Vezzetti⁴

Affiliations

¹ DIGEP, Polytechnic University of Turin, Corso Duca degli Abruzzi 24, Torino 10129, Italy. Electronic address: leonardo.tanzi@polito.it.
² School of Medicine, University of Turin, Torino 10133, Italy.
³ LTI University of Picardie Jules Verne, Amiens 80000, France.
⁴ DIGEP, Polytechnic University of Turin, Corso Duca degli Abruzzi 24, Torino 10129, Italy.

PMID: 35469638
DOI: 10.1016/j.injury.2022.04.013

Abstract

Introduction: In recent years, the scientific community focused on developing Computer-Aided Diagnosis (CAD) tools that could improve clinicians' bone fractures diagnosis, primarily based on Convolutional Neural Networks (CNNs). However, the discerning accuracy of fractures' subtypes was far from optimal. The aim of the study was 1) to evaluate a new CAD system based on Vision Transformers (ViT), a very recent and powerful deep learning technique, and 2) to assess whether clinicians' diagnostic accuracy could be improved using this system.

Materials and methods: 4207 manually annotated images were used and distributed, by following the AO/OTA classification, in different fracture types. The ViT architecture was used and compared with a classic CNN and a multistage architecture composed of successive CNNs. To demonstrate the reliability of this approach, (1) the attention maps were used to visualize the most relevant areas of the images, (2) the performance of a generic CNN and ViT was compared through unsupervised learning techniques, and (3) 11 clinicians were asked to evaluate and classify 150 proximal femur fractures' images with and without the help of the ViT, then results were compared for potential improvement.

Results: The ViT was able to predict 83% of the test images correctly. Precision, recall and F1-score were 0.77 (CI 0.64-0.90), 0.76 (CI 0.62-0.91) and 0.77 (CI 0.64-0.89), respectively. The clinicians' diagnostic improvement was 29% (accuracy 97%; p 0.003) when supported by ViT's predictions, outperforming the algorithm alone.

Conclusions: This paper showed the potential of Vision Transformers in bone fracture classification. For the first time, good results were obtained in sub-fractures classification, outperforming the state of the art. Accordingly, the assisted diagnosis yielded the best results, proving the effectiveness of collaborative work between neural networks and clinicians.

Keywords: CAD system; Deep learning; Femur fracture; Self-attention; Vision transformer.

MeSH terms

Diagnosis, Computer-Assisted / methods
Femoral Fractures* / diagnostic imaging
Femoral Fractures* / surgery
Femur
Humans
Neural Networks, Computer*
Reproducibility of Results