Background: Early accurate detection of all skin cancer types is essential to guide appropriate management and to improve morbidity and survival. Melanoma and cutaneous squamous cell carcinoma (cSCC) are high-risk skin cancers which have the potential to metastasise and ultimately lead to death, whereas basal cell carcinoma (BCC) is usually localised with potential to infiltrate and damage surrounding tissue. Anxiety around missing early curable cases needs to be balanced against inappropriate referral and unnecessary excision of benign lesions. Computer-assisted diagnosis (CAD) systems use artificial intelligence to analyse lesion data and arrive at a diagnosis of skin cancer. When used in unreferred settings ('primary care'), CAD may assist general practitioners (GPs) or other clinicians to more appropriately triage high-risk lesions to secondary care. Used alongside clinical and dermoscopic suspicion of malignancy, CAD may reduce unnecessary excisions without missing melanoma cases.
Objectives: To determine the accuracy of CAD systems for diagnosing cutaneous invasive melanoma and atypical intraepidermal melanocytic variants, BCC or cSCC in adults, and to compare its accuracy with that of dermoscopy.
Search methods: We undertook a comprehensive search of the following databases from inception up to August 2016: Cochrane Central Register of Controlled Trials (CENTRAL); MEDLINE; Embase; CINAHL; CPCI; Zetoc; Science Citation Index; US National Institutes of Health Ongoing Trials Register; NIHR Clinical Research Network Portfolio Database; and the World Health Organization International Clinical Trials Registry Platform. We studied reference lists and published systematic review articles.
Selection criteria: Studies of any design that evaluated CAD alone, or in comparison with dermoscopy, in adults with lesions suspicious for melanoma or BCC or cSCC, and compared with a reference standard of either histological confirmation or clinical follow-up.
Data collection and analysis: Two review authors independently extracted all data using a standardised data extraction and quality assessment form (based on QUADAS-2). We contacted authors of included studies where information related to the target condition or diagnostic threshold were missing. We estimated summary sensitivities and specificities separately by type of CAD system, using the bivariate hierarchical model. We compared CAD with dermoscopy using (a) all available CAD data (indirect comparisons), and (b) studies providing paired data for both tests (direct comparisons). We tested the contribution of human decision-making to the accuracy of CAD diagnoses in a sensitivity analysis by removing studies that gave CAD results to clinicians to guide diagnostic decision-making.
Main results: We included 42 studies, 24 evaluating digital dermoscopy-based CAD systems (Derm-CAD) in 23 study cohorts with 9602 lesions (1220 melanomas, at least 83 BCCs, 9 cSCCs), providing 32 datasets for Derm-CAD and seven for dermoscopy. Eighteen studies evaluated spectroscopy-based CAD (Spectro-CAD) in 16 study cohorts with 6336 lesions (934 melanomas, 163 BCC, 49 cSCCs), providing 32 datasets for Spectro-CAD and six for dermoscopy. These consisted of 15 studies using multispectral imaging (MSI), two studies using electrical impedance spectroscopy (EIS) and one study using diffuse-reflectance spectroscopy. Studies were incompletely reported and at unclear to high risk of bias across all domains. Included studies inadequately address the review question, due to an abundance of low-quality studies, poor reporting, and recruitment of highly selected groups of participants.Across all CAD systems, we found considerable variation in the hardware and software technologies used, the types of classification algorithm employed, methods used to train the algorithms, and which lesion morphological features were extracted and analysed across all CAD systems, and even between studies evaluating CAD systems. Meta-analysis found CAD systems had high sensitivity for correct identification of cutaneous invasive melanoma and atypical intraepidermal melanocytic variants in highly selected populations, but with low and very variable specificity, particularly for Spectro-CAD systems. Pooled data from 22 studies estimated the sensitivity of Derm-CAD for the detection of melanoma as 90.1% (95% confidence interval (CI) 84.0% to 94.0%) and specificity as 74.3% (95% CI 63.6% to 82.7%). Pooled data from eight studies estimated the sensitivity of multispectral imaging CAD (MSI-CAD) as 92.9% (95% CI 83.7% to 97.1%) and specificity as 43.6% (95% CI 24.8% to 64.5%). When applied to a hypothetical population of 1000 lesions at the mean observed melanoma prevalence of 20%, Derm-CAD would miss 20 melanomas and would lead to 206 false-positive results for melanoma. MSI-CAD would miss 14 melanomas and would lead to 451 false diagnoses for melanoma. Preliminary findings suggest CAD systems are at least as sensitive as assessment of dermoscopic images for the diagnosis of invasive melanoma and atypical intraepidermal melanocytic variants. We are unable to make summary statements about the use of CAD in unreferred populations, or its accuracy in detecting keratinocyte cancers, or its use in any setting as a diagnostic aid, because of the paucity of studies.
Authors' conclusions: In highly selected patient populations all CAD types demonstrate high sensitivity, and could prove useful as a back-up for specialist diagnosis to assist in minimising the risk of missing melanomas. However, the evidence base is currently too poor to understand whether CAD system outputs translate to different clinical decision-making in practice. Insufficient data are available on the use of CAD in community settings, or for the detection of keratinocyte cancers. The evidence base for individual systems is too limited to draw conclusions on which might be preferred for practice. Prospective comparative studies are required that evaluate the use of already evaluated CAD systems as diagnostic aids, by comparison to face-to-face dermoscopy, and in participant populations that are representative of those in which the test would be used in practice.