Background and study aims: Artificial intelligence (AI)-based systems for computer-aided detection (CADe) of polyps receive regular updates and occasionally offer customizable detection thresholds, both of which impact their performance, but little is known about these effects. This study aimed to compare the performance of different CADe systems on the same benchmark dataset.
Methods: 101 colonoscopy videos were used as benchmark. Each video frame with a visible polyp was manually annotated with bounding boxes, resulting in 129 705 polyp images. The videos were then analyzed by three different CADe systems, representing five conditions: two versions of GI Genius, Endo-AID with detection Types A and B, and EndoMind, a freely available system. Evaluation included an analysis of sensitivity and false-positive rate, among other metrics.
Results: Endo-AID detection Type A, the earlier version of GI Genius, and EndoMind detected all 93 polyps. Both the later version of GI Genius and Endo-AID Type B missed 1 polyp. The mean per-frame sensitivities were 50.63 % and 67.85 %, respectively, for the earlier and later versions of GI Genius, 65.60 % and 52.95 %, respectively, for Endo-AID Types A and B, and 60.22 % for EndoMind.
Conclusions: This study compares the performance of different CADe systems, different updates, and different configuration modes. This might help clinicians to select the most appropriate system for their specific needs.
The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).