Deep Learning Methods for Anatomical Landmark Detection in Video Capsule Endoscopy Images

Proc Future Technol Conf (2020). 2021 Nov;1288:426-434. doi: 10.1007/978-3-030-63128-4_32. Epub 2020 Oct 31.

Abstract

Video capsule endoscope (VCE) is an emerging technology that allows examination of the entire gastrointestinal (GI) tract with minimal invasion. While traditional endoscopy with biopsy procedures are the gold standard for diagnosis of most GI diseases, they are limited by how far the scope can be advanced in the tract and are also invasive. VCE allows gastroenterologists to investigate GI tract abnormalities in detail with visualization of all parts of the GI tract. It captures continuous real time images as it is propelled in the GI tract by gut motility. Even though VCE allows for thorough examination, reviewing and analyzing up to eight hours of images (compiled as videos) is tedious and not cost effective. In order to pave way for automation of VCE-based GI disease diagnosis, detecting the location of the capsule would allow for a more focused analysis as well as abnormality detection in each region of the GI tract. In this paper, we compared four deep Convolutional Neural Network models for feature extraction and detection of the anatomical part within the GI tract captured by VCE images. Our results showed that VGG-Net has superior performance with the highest average accuracy, precision, recall and, F1-score compared to other state of the art architectures: GoogLeNet, AlexNet and, ResNet.

Keywords: AlexNet; Convolutional neural network; Gastrointestinal tract; GoogLeNet; Gradient-weighted class activation mapping (Grad-CAM); ResNet; VGG-net; Video capsule endoscopy.