The statistical analysis of kappa statistics in multiple samples

J Clin Epidemiol. 1996 Sep;49(9):1053-8. doi: 10.1016/0895-4356(96)00057-1.


Methods are presented for assessing and comparing the results of k > or = 2 independent samples of measured agreement or concordance, where in each sample a given member of a pair of observations is classified according to the presence or absence of a binary trait. Examples include the assessment of interobserver agreement across different groups of patients in a clinical study, investigations of sibling concordance across different genetic groups, and meta-analyses of observer agreement across different studies. The methodology described is based on application of goodness-of-fit theory to testing hypotheses concerning kappa statistics. Partitioning methods allow a variety of hypotheses to be tested, including an assessment of the degree of agreement within each sample, a testing procedure based on the pooled data, and a test of heterogeneity that may be used to assess the validity of pooling across samples. Three examples are given.

MeSH terms

  • Humans
  • Meta-Analysis as Topic
  • Models, Statistical
  • Observer Variation*
  • Statistics as Topic*