ROC curves and summary measures of accuracy derived from them, such as the area under the ROC curve, have become the standard for describing and comparing the accuracy of diagnostic tests. Methods for estimating ROC curves rely on the existence of a gold standard which dichotomizes patients into disease present or absent. There are, however, many examples of diagnostic tests whose gold standards are not binary-scale, but rather continuous-scale. Unnatural dichotomization of these gold standards leads to bias and inconsistency in estimates of diagnostic accuracy. In this paper, we propose a non-parametric estimator of diagnostic test accuracy which does not require dichotomization of the gold standard. This estimator has an interpretation analogous to the area under the ROC curve. We propose a confidence interval for test accuracy and a statistical test for comparing accuracies of tests from paired designs. We compare the performance (i.e. CI coverage, type I error rate, power) of the proposed methods with several alternatives. An example is presented where the accuracies of two quick blood tests for measuring serum iron concentrations are estimated and compared.