Study design: Observational, cross-sectional reliability study.
Objectives: To examine the interrater reliability of novice raters in their use of the treatment-based classification (TBC) system for low back pain and to explore the patterns of disagreement in classification errors.
Background: Although the interrater reliability of individual test items in the TBC system is moderate to good, some error persists in classification decision making. Understanding which classification errors are common could direct further refinement of the TBC system.
Methods: Using previously recorded patient data (n = 24), 12 novice raters classified patients according to the TBC schema. These classification results were combined with those of 7 other raters, allowing examination of the overall agreement using the kappa statistic, as well as agreement/disagreement among pairwise comparisons in classification assignments. A chi-square test examined differences in percent agreement between the novice and more experienced raters and differences in classification distributions between these 2 groups of raters.
Results: Among 12 novice raters, there was 80.9% agreement in the pairs of classification (κ = 0.62; 95% confidence interval: 0.59, 0.65) and an overall 75.5% agreement (κ = 0.57; 95% confidence interval: 0.55, 0.69) for the combined data set. Raters were least likely to agree on a classification of stabilization (77.5% agreement). The overall percentage of pairwise classification judgments that disagreed was 24.5%, with the most common disagreement being between manipulation and stabilization (11.0%), followed by a mismatch between stabilization and specific exercise (8.2%).
Conclusion: Additional refinement is needed to reduce rater disagreement that persists in the TBC decision-making algorithm, particularly in the stabilization category. J Orthop Sports Phys Ther 2012;42(9):797-805, Epub 7 June 2012. doi:10.2519/jospt.2012.4078.