Laboratory tests that validate psychiatric disorder are unavailable. Accordingly, the validity of structured diagnostic interviews such as the Diagnostic Interview Schedule have been assessed through a double-blind test-retest design. This approach compares the Diagnostic Interview Schedule to a clinician's assessment and evaluates its results by three statistics: sensitivity and specificity, for which the clinician's interview serves as the standard, and K, which measures concordance between the two interviews. This design is found wanting on several counts: the reinterview may be answered differently because of clinical change or because of its meaning to the respondent; the clinician's interview may be an erratic standard; and the statistics are affected by both prevalence and severity of disorder. Furthermore, the statistics may not predict the accuracy of prevalence estimates made by the interview or its ability to detect correlates of disorder. Some alternative approaches are suggested.