Image Analysis-Based Machine Learning for the Diagnosis of Retinopathy of Prematurity: A Meta-analysis and Systematic Review

Ophthalmol Retina. 2024 Jan 17:S2468-6530(24)00014-9. doi: 10.1016/j.oret.2024.01.013. Online ahead of print.

Abstract

Topic: To evaluate the performance of machine learning (ML) in the diagnosis of retinopathy of prematurity (ROP) and to assess whether it can be an effective automated diagnostic tool for clinical applications.

Clinical relevance: Early detection of ROP is crucial for preventing tractional retinal detachment and blindness in preterm infants, which has significant clinical relevance.

Methods: Web of Science, PubMed, Embase, IEEE Xplore, and Cochrane Library were searched for published studies on image-based ML for diagnosis of ROP or classification of clinical subtypes from inception to October 1, 2022. The quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies was used to determine the risk of bias (RoB) of the included original studies. A bivariate mixed effects model was used for quantitative analysis of the data, and the Deek's test was used for calculating publication bias. Quality of evidence was assessed using Grading of Recommendations Assessment, Development and Evaluation.

Results: Twenty-two studies were included in the systematic review; 4 studies had high or unclear RoB. In the area of indicator test items, only 2 studies had high or unclear RoB because they did not establish predefined thresholds. In the area of reference standards, 3 studies had high or unclear RoB. Regarding applicability, only 1 study was considered to have high or unclear applicability in terms of patient selection. The sensitivity and specificity of image-based ML for the diagnosis of ROP were 93% (95% confidence interval [CI]: 0.90-0.94) and 95% (95% CI: 0.94-0.97), respectively. The area under the receiver operating characteristic curve (AUC) was 0.98 (95% CI: 0.97-0.99). For the classification of clinical subtypes of ROP, the sensitivity and specificity were 93% (95% CI: 0.89-0.96) and 93% (95% CI: 0.89-0.95), respectively, and the AUC was 0.97 (95% CI: 0.96-0.98). The classification results were highly similar to those of clinical experts (Spearman's R = 0.879).

Conclusions: Machine learning algorithms are no less accurate than human experts and hold considerable potential as automated diagnostic tools for ROP. However, given the quality and high heterogeneity of the available evidence, these algorithms should be considered as supplementary tools to assist clinicians in diagnosing ROP.

Financial disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords: Image analysis–based; Machine learning; Meta-analysis; Retinopathy of prematurity; Systematic review.

Publication types

  • Review