Kappa statistics in the assessment of observer variation: the significance of multiple observers classifying ankle fractures

J Orthop Sci. 2002;7(2):163-6. doi: 10.1007/s007760200028.

Abstract

Studies using kappa statistics have been conducted with a varied but limited number of observers. The aim of this study was to evaluate the significance of multiple observers on kappa as a measure of observer variation. One hundred orthopedic specialists were asked to assess a random sample of ten sets of standard radiographs of 94 consecutive patients with ankle fractures. The observers were randomly allocated into four groups, which again were divided into subgroups with an increasing number of observers. Random subgroups of three observers revealed kappa values from 0.20 to 0.64 in the Lauge-Hansen and 0.27 to 0.90 in the Weber classification system. With an increasing number of observers in the subgroups, kappa stabilizes around a mean value, indicating that the sampling variation and standard error decrease. The standard error found in this study makes kappa questionable as a measure for agreement among a small number of observers. Thus, kappa values obtained for a given diagnostic tool at one department are not directly comparable with results from other departments. We conclude that kappa cannot stand alone as a simple measure of observer variation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Ankle Injuries / classification*
  • Ankle Injuries / diagnostic imaging
  • Fractures, Bone / classification*
  • Fractures, Bone / diagnostic imaging
  • Humans
  • Observer Variation
  • Radiography
  • Reproducibility of Results
  • Statistics as Topic*