Background: Kappa is commonly used when assessing the agreement of conditions with reference standard, but has been criticized for being highly dependent on the prevalence. To overcome this limitation, a prevalence-adjusted and bias-adjusted kappa (PABAK) has been developed. The purpose of this study is to demonstrate the performance of Kappa and PABAK, and assess the agreement between hospital discharge administrative data and chart review data conditions.
Methods: The agreement was compared for random sampling, restricted sampling by conditions, and case-control sampling from the four teaching hospitals in Alberta, Canada from ICD10 administrative data during January 1, 2003 and June 30, 2003. A total of 4,008 hospital discharge records and chart view, linked for personal unique identifier and admission date, for 32 conditions of random sampling were analyzed. The restricted sample for hypertension, myocardial infarction and congestive heart failure, and case-control sample for those three conditions were extracted from random sample. The prevalence, kappa, PABAK, positive agreement, negative agreement for the condition was compared for each of three samples.
Results: The prevalence of each condition was highly dependent on the sampling method, and this variation in prevalence had a significant effect on both kappa and PABAK. PABAK values were obviously high for certain conditions with low kappa values. The gap between these two statistical values for the same condition narrowed as the prevalence of the condition approached 50%.
Conclusion: Kappa values varied more widely than PABAK values across the 32 conditions. PABAK values should usually not be interpreted as measuring the same agreement as kappa in administrative data, particular for the condition with low prevalence. There is no single statistic measuring agreement that captures the desired information for validity of administrative data. Researchers should report kappa, the prevalence, positive agreement, negative agreement, and the relative frequency in each cell (i.e. a, b, c and d) to enable the reader to judge the validity of administrative data from multiple aspects.