Dependence of weighted kappa coefficients on the number of categories

Epidemiology. 1996 Mar;7(2):199-202. doi: 10.1097/00001648-199603000-00016.


Weighted kappa coefficients are commonly used to quantify inter- or intra-rater reliability or test-retest reliability of ordinal ratings in clinical and epidemiologic applications. In this paper, we assess the dependence of weighted kappa coefficients on the number of categories and the type of weighting scheme, which vary between applications. The most commonly used weights are weights that are proportional to the deviation of individual ratings ("linear weights") or to the square of the deviation of individual ratings ("quadratic weights"). Quadratically weighted kappa coefficients are equivalent to the intraclass correlation coefficient and to the product-moment correlation coefficient under certain conditions. We illustrate that an increase of quadratically weighted kappa coefficients with the number of categories is expected under a broad variety of conditions, whereas linearly weighted kappa coefficients appear to be less sensitive to the number of categories. Number of categories and type of weighting scheme therefore require careful consideration in the interpretation of weighted kappa coefficients.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Interpretation, Statistical*
  • Epidemiologic Methods*
  • Humans
  • Models, Statistical
  • Normal Distribution
  • Observer Variation
  • Reproducibility of Results