Significant prostate carcinoma (sPCa) classification based on MRI using radiomics or deep learning approaches has gained much interest, due to the potential application in assisting in clinical decision-making.
Objective: To systematically review the literature (i) to determine which algorithms are most frequently used for sPCa classification, (ii) to investigate whether there exists a relation between the performance and the method or the MRI sequences used, (iii) to assess what study design factors affect the performance on sPCa classification, and (iv) to research whether performance had been evaluated in a clinical setting Methods: The databases Embase and Ovid MEDLINE were searched for studies describing machine learning or deep learning classification methods discriminating between significant and nonsignificant PCa on multiparametric MRI that performed a valid validation procedure. Quality was assessed by the modified radiomics quality score. We computed the median area under the receiver operating curve (AUC) from overall methods and the interquartile range.
Results: From 2846 potentially relevant publications, 27 were included. The most frequent algorithms used in the literature for PCa classification are logistic regression (22%) and convolutional neural networks (CNNs) (22%). The median AUC was 0.79 (interquartile range: 0.77-0.87). No significant effect of number of included patients, image sequences, or reference standard on the reported performance was found. Three studies described an external validation and none of the papers described a validation in a prospective clinical trial.
Conclusions: To unlock the promising potential of machine and deep learning approaches, validation studies and clinical prospective studies should be performed with an established protocol to assess the added value in decision-making.
Keywords: Gleason score; classification; clinically significant; deep learning; machine learning; model; mpMRI; prediction; prostate carcinoma; radiomics; systematic review.