The meaning of kappa: probabilistic concepts of reliability and validity revisited

J Clin Epidemiol. 1996 Jul;49(7):775-82. doi: 10.1016/0895-4356(96)00011-x.


A framework--the "agreement concept"--is developed to study the use of Cohen's kappa as well as alternative measures of chance-corrected agreement in a unified manner. Focusing on intrarater consistency it is demonstrated that for 2 x 2 tables an adequate choice between different measures of chance-corrected agreement can be made only if the characteristics of the observational setting are taken into account. In particular, a naive use of Cohen's kappa may lead to strikingly overoptimistic estimates of chance-corrected agreement. Such bias can be overcome by more elaborate study designs that allow for an unrestricted estimation of the probabilities at issue. When Cohen's kappa is appropriately applied as a measure of chance-corrected agreement, its values prove to be a linear--and not a parabolic--function of true prevalence. It is further shown how the validity of ratings is influenced by lack of consistency. Depending on the design of a validity study, this may lead, on purely formal grounds, to prevalence-dependent estimates of sensitivity and specificity. Proposed formulas for "chance-corrected" validity indexes fail to adjust for this phenomenon.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Statistical
  • Probability*
  • Sensitivity and Specificity