How reliable are chance-corrected measures of agreement?

I Guggenmoos-Holzmann

doi:10.1002/sim.4780122305

How reliable are chance-corrected measures of agreement?

Stat Med. 1993 Dec 15;12(23):2191-205. doi: 10.1002/sim.4780122305.

Author

I Guggenmoos-Holzmann¹

Affiliation

¹ Department of Biostatistics and Medical Informatics, Free University of Berlin, Germany.

PMID: 8310189
DOI: 10.1002/sim.4780122305

Abstract

Chance-corrected measures of agreement are prone to exhibit paradoxical and counter-intuitive results when used as measures of reliability. It is demonstrated that these problems arise with Cohen's kappa as well as with Aickin's alpha. They are the consequence of an analogy to Simpson's paradox in mixed populations. It is further shown that chance-corrected measures of agreement may yield misleading values for binary ratings. It is concluded that improvements in the design and the analysis of reliability studies are a prerequisite for valid and pertinent results.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Classification
Data Interpretation, Statistical*
Diagnosis*
Epidemiology*
Humans
Likelihood Functions
Linear Models
Models, Statistical*
Observer Variation*
Reproducibility of Results
Sensitivity and Specificity*