A model was developed for a simple clinical trial in which graders had defined probabilities of misclassifying pathologic material to disease present or absent. The authors compared Kappa between graders, and efficiency and bias in the clinical trial in the presence of misclassification. Though related to bias and efficiency, Kappa did not predict these two statistics well. These results pertain generally to evaluation of systems for encoding medical information, and the relevance of Kappa in determining whether such systems are ready for use in comparative studies. The authors conclude that, by itself, Kappa is not informative enough to evaluate the appropriateness of a grading scheme for comparative studies. Additional, and perhaps difficult, questions must be addressed for such evaluation.