Learning how to differ: agreement and reliability statistics in psychiatry

Can J Psychiatry. 1995 Mar;40(2):60-6.


Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

MeSH terms

  • Humans
  • Mental Disorders / classification
  • Mental Disorders / diagnosis*
  • Mental Disorders / psychology
  • Observer Variation
  • Personality Assessment / statistics & numerical data*
  • Psychiatric Status Rating Scales / statistics & numerical data*
  • Psychometrics
  • Reproducibility of Results