The number of publications on deep learning for cancer diagnostics is rapidly increasing, and systems are frequently claimed to perform comparable with or better than clinicians. However, few systems have yet demonstrated real-world medical utility. In this Perspective, we discuss reasons for the moderate progress and describe remedies designed to facilitate transition to the clinic. Recent, presumably influential, deep learning studies in cancer diagnostics, of which the vast majority used images as input to the system, are evaluated to reveal the status of the field. By manipulating real data, we then exemplify that much and varied training data facilitate the generalizability of neural networks and thus the ability to use them clinically. To reduce the risk of biased performance estimation of deep learning systems, we advocate evaluation in external cohorts and strongly advise that the planned analyses, including a predefined primary analysis, are described in a protocol preferentially stored in an online repository. Recommended protocol items should be established for the field, and we present our suggestions.