Pitfalls in training and validation of deep learning systems

Tom Eelbode; Pieter Sinonquel; Frederik Maes; Raf Bisschops

doi:10.1016/j.bpg.2020.101712

Pitfalls in training and validation of deep learning systems

Best Pract Res Clin Gastroenterol. 2021 Jun-Aug:52-53:101712. doi: 10.1016/j.bpg.2020.101712. Epub 2020 Dec 4.

Authors

Tom Eelbode¹, Pieter Sinonquel², Frederik Maes³, Raf Bisschops⁴

Affiliations

¹ Department of Electrical Engineering (ESAT/PSI), KU Leuven, Kasteelpark Arenberg 10/2446, 3001, Leuven, Belgium; Medical Imaging Research Center (MIRC), UZ Leuven, Herestraat 49, 3000, Leuven, Belgium. Electronic address: tom.eelbode@kuleuven.be.
² Department of Gastroenterology and Hepatology, University Hospitals Leuven, Herestraat 49, 3000, Leuven, Belgium; Department of Translational Research in Gastrointestinal Diseases (TARGID), Catholic University Leuven, Herestraat 49, 3000, Leuven, Belgium. Electronic address: pieter.sinonquel@uzleuven.be.
³ Department of Electrical Engineering (ESAT/PSI), KU Leuven, Kasteelpark Arenberg 10/2446, 3001, Leuven, Belgium; Medical Imaging Research Center (MIRC), UZ Leuven, Herestraat 49, 3000, Leuven, Belgium. Electronic address: frederik.maes@kuleuven.be.
⁴ Department of Gastroenterology and Hepatology, University Hospitals Leuven, Herestraat 49, 3000, Leuven, Belgium; Department of Translational Research in Gastrointestinal Diseases (TARGID), Catholic University Leuven, Herestraat 49, 3000, Leuven, Belgium. Electronic address: raf.bisschops@uzleuven.be.

PMID: 34172245
DOI: 10.1016/j.bpg.2020.101712

Abstract

The number of publications in endoscopic journals that present deep learning applications has risen tremendously over the past years. Deep learning has shown great promise for automated detection, diagnosis and quality improvement in endoscopy. However, the interdisciplinary nature of these works has undoubtedly made it more difficult to estimate their value and applicability. In this review, the pitfalls and common misconducts when training and validating deep learning systems are discussed and some practical guidelines are proposed that should be taken into account when acquiring data and handling it to ensure an unbiased system that will generalize for application in routine clinical practice. Finally, some considerations are presented to ensure correct validation and comparison of AI systems.

Keywords: Benchmarking; Deep learning; Reproducibility of results; Supervised machine learning.

Publication types

Review

MeSH terms

Deep Learning / standards*
Humans
Validation Studies as Topic*