Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Aneeq Zia; Liheng Guo; Linlin Zhou; Irfan Essa; Anthony Jarc

doi:10.1007/s11548-019-02025-w

Novel evaluation of surgical activity recognition models using task-based efficiency metrics

Int J Comput Assist Radiol Surg. 2019 Dec;14(12):2155-2163. doi: 10.1007/s11548-019-02025-w. Epub 2019 Jul 2.

Authors

Aneeq Zia¹, Liheng Guo², Linlin Zhou², Irfan Essa³, Anthony Jarc²

Affiliations

¹ College of Computing, Georgia Institute of Technology, North Ave NW, Atlanta, GA, 30332, USA. aneeqzia@gmail.com.
² Medical Research, Intuitive Surgical, Inc., 5655 Spalding Drive, Norcross, GA, 30092, USA.
³ College of Computing, Georgia Institute of Technology, North Ave NW, Atlanta, GA, 30332, USA.

PMID: 31267333
DOI: 10.1007/s11548-019-02025-w

Abstract

Purpose: Surgical task-based metrics (rather than entire procedure metrics) can be used to improve surgeon training and, ultimately, patient care through focused training interventions. Machine learning models to automatically recognize individual tasks or activities are needed to overcome the otherwise manual effort of video review. Traditionally, these models have been evaluated using frame-level accuracy. Here, we propose evaluating surgical activity recognition models by their effect on task-based efficiency metrics. In this way, we can determine when models have achieved adequate performance for providing surgeon feedback via metrics from individual tasks.

Methods: We propose a new CNN-LSTM model, RP-Net-V2, to recognize the 12 steps of robotic-assisted radical prostatectomies (RARP). We evaluated our model both in terms of conventional methods (e.g., Jaccard Index, task boundary accuracy) as well as novel ways, such as the accuracy of efficiency metrics computed from instrument movements and system events.

Results: Our proposed model achieves a Jaccard Index of 0.85 thereby outperforming previous models on RARP. Additionally, we show that metrics computed from tasks automatically identified using RP-Net-V2 correlate well with metrics from tasks labeled by clinical experts.

Conclusion: We demonstrate that metrics-based evaluation of surgical activity recognition models is a viable approach to determine when models can be used to quantify surgical efficiencies. We believe this approach and our results illustrate the potential for fully automated, postoperative efficiency reports.

Keywords: Machine learning; Robotic-assisted surgery; Surgeon training; Surgical activity recognition.

MeSH terms

Benchmarking
Clinical Competence*
Humans
Machine Learning*
Male
Models, Anatomic*
Prostatectomy / education*
Robotic Surgical Procedures / methods*
Surgeons / education