Study design: Test-retest design to examine interrater reliability.
Objective: Examine the interrater reliability of individual examination items and a classification decision-making algorithm using physical therapists with varying levels of experience.
Summary of background data: Classifying patients based on clusters of examination findings has shown promise for improving outcomes. Examining the reliability of examination items and the classification decision-making algorithm may improve the reproducibility of classification methods.
Methods: Patients with low back pain less than 90 days in duration participating in a randomized trial were examined on separate days by different examiners. Interrater reliability of individual examination items important for classification was examined in clinically stable patients using kappa coefficients and intraclass correlation coefficients. The findings from the first examination were used to classify each patient using the decision-making algorithm by clinicians with varying amounts of experience. The reliability of the classification algorithm was examined with kappa coefficients.
Results: A total of 123 patients participated (mean age 37.7 [+/-10.7] years, 44% female), 60 (49%) remained stable between examinations. Reliability of range of motion, centralization/peripheralization judgments with flexion and extension, and the instability test were moderate to excellent. Reliability of centralization/peripheralization judgments with repeated or sustained extension or aberrant movement judgments were fair to poor. Overall agreement on classification decisions was 76% (kappa = 0.60, 95% confidence interval 0.56, 0.64), with no significant differences based on level of experience.
Conclusion: Reliability of the classification algorithm was good. Further research is needed to identify sources of disagreements and improve reproducibility.