Study design: Retrospective magnetic resonance imaging grading with comparison between experts and deep convolutional neural networks (CNNs).
Objective: This study aims to verify the feasibility of a computer-assisted spine stenosis grading system by comparing the diagnostic agreement between two experts and the agreement between the experts and trained artificial CNN classifiers.
Summary of background data: Spinal stenosis grading is important; however, it is tedious job to check the MR images slide by slide to classify patient grades often having different opinions regarding the final diagnosis.
Methods: For 542 L4-5 axial MR images, two experts independently localized the center position of the spine canal and graded the status. Two CNN classifiers each trained with the grading label made by the two experts were validated using 10-fold cross-validation. Each classifier consisted of a CNN detection model responsible for the localization of patches near the canal and a classification CNN model to predict the spinal stenosis status in the localized patches. Faster R-CNN was used for the detection model whereas VGG network was used for the classification model. A comparison in grading agreement was carried out between the two experts as well as that of the experts and the prediction results generated by the CNN models.
Results: Grading agreement between the experts was 77.5% and 75% in terms of accuracy and F1 scores. The agreement between the first expert and the model trained with the labels of the first expert was 83% and 75.4%, respectively. The agreement between the second expert and the model trained with the labels of the second expert was 77.9% and 74.9%. The differences between the two experts were significant, whereas the differences between each expert and the trained models were not significant.
Conclusion: We indeed confirmed that automatic diagnosis using deep learning may be feasible for spinal stenosis grading.
Level of evidence: 4.