Improved Training Efficiency for Retinopathy of Prematurity Deep Learning Models Using Comparison versus Class Labels

Adam Hanif; İlkay Yıldız; Peng Tian; Beyza Kalkanlı; Deniz Erdoğmuş; Stratis Ioannidis; Jennifer Dy; Jayashree Kalpathy-Cramer; Susan Ostmo; Karyn Jonas; R V Paul Chan; Michael F Chiang; J Peter Campbell

doi:10.1016/j.xops.2022.100122

Improved Training Efficiency for Retinopathy of Prematurity Deep Learning Models Using Comparison versus Class Labels

Ophthalmol Sci. 2022 Feb 2;2(2):100122. doi: 10.1016/j.xops.2022.100122. eCollection 2022 Jun.

Affiliations

¹ Department of Ophthalmology, Oregon Health & Science University, Portland, Oregon.
² Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts.
³ Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging Clinical Computational Neuroimaging Group, Charlestown, Massachusetts.
⁴ Department of Ophthalmology, University of Illinois at Chicago College of Medicine, Chicago, Illinois.
⁵ National Eye Institute, National Institutes of Health, Bethesda, Maryland.

Abstract

Purpose: To compare the efficacy and efficiency of training neural networks for medical image classification using comparison labels indicating relative disease severity versus diagnostic class labels from a retinopathy of prematurity (ROP) image dataset.

Design: Evaluation of diagnostic test or technology.

Participants: Deep learning neural networks trained on expert-labeled wide-angle retinal images obtained from patients undergoing diagnostic ROP examinations obtained as part of the Imaging and Informatics in ROP (i-ROP) cohort study.

Methods: Neural networks were trained with either class or comparison labels indicating plus disease severity in ROP retinal fundus images from 2 datasets. After training and validation, all networks underwent evaluation using a separate test dataset in 1 of 2 binary classification tasks: normal versus abnormal or plus versus nonplus.

Main outcome measures: Area under the receiver operating characteristic curve (AUC) values were measured to assess network performance.

Results: Given the same number of labels, neural networks learned more efficiently by comparison, generating significantly higher AUCs in both classification tasks across both datasets. Similarly, given the same number of images, comparison learning developed networks with significantly higher AUCs across both classification tasks in 1 of 2 datasets. The difference in efficiency and accuracy between models trained on either label type decreased as the size of the training set increased.

Conclusions: Comparison labels individually are more informative and more abundant per sample than class labels. These findings indicate a potential means of overcoming the common obstacle of data variability and scarcity when training neural networks for medical image classification tasks.

Keywords: ANOVA, analysis of variance; AUC, area under the receiver operating characteristic curve; Artificial intelligence; Deep learning; ICROP, International Classification of Retinopathy of Prematurity; Labels; Neural networks; ROP, retinopathy of prematurity; Retinopathy of prematurity; i-ROP, Imaging and Informatics in ROP.