Purpose: To compare performance of a deep-learning enhanced algorithm for automated detection of diabetic retinopathy (DR), to the previously published performance of that algorithm, the Iowa Detection Program (IDP)-without deep learning components-on the same publicly available set of fundus images and previously reported consensus reference standard set, by three US Board certified retinal specialists.
Methods: We used the previously reported consensus reference standard of referable DR (rDR), defined as International Clinical Classification of Diabetic Retinopathy moderate, severe nonproliferative (NPDR), proliferative DR, and/or macular edema (ME). Neither Messidor-2 images, nor the three retinal specialists setting the Messidor-2 reference standard were used for training IDx-DR version X2.1. Sensitivity, specificity, negative predictive value, area under the curve (AUC), and their confidence intervals (CIs) were calculated.
Results: Sensitivity was 96.8% (95% CI: 93.3%-98.8%), specificity was 87.0% (95% CI: 84.2%-89.4%), with 6/874 false negatives, resulting in a negative predictive value of 99.0% (95% CI: 97.8%-99.6%). No cases of severe NPDR, PDR, or ME were missed. The AUC was 0.980 (95% CI: 0.968-0.992). Sensitivity was not statistically different from published IDP sensitivity, which had a CI of 94.4% to 99.3%, but specificity was significantly better than the published IDP specificity CI of 55.7% to 63.0%.
Conclusions: A deep-learning enhanced algorithm for the automated detection of DR, achieves significantly better performance than a previously reported, otherwise essentially identical, algorithm that does not employ deep learning. Deep learning enhanced algorithms have the potential to improve the efficiency of DR screening, and thereby to prevent visual loss and blindness from this devastating disease.