Objectives: To determine the intra- and interrater reliability of the Action Research Arm (ARA) test, to assess its ability to detect a minimal clinically important difference (MCID) of 5.7 points, and to identify less reliable test items.
Design: Intrarater reliability of the sum scores and of individual items was assessed by comparing (1) the ratings of the laboratory measurements of 20 patients with the ratings of the same measurements recorded on videotape by the original rater, and (2) the repeated ratings of videotaped measurements by the same rater. Interrater reliability was assessed by comparing the ratings of the videotaped measurements of 2 raters. The resulting limits of agreement were compared with the MCID.
Patients: Stratified sample, based on the intake ARA score, of 20 chronic stroke patients (median age, 62yr; median time since stroke onset, 3.6yr; mean intake ARA score, 29.2).
Main outcome measures: Spearman's rank-order correlation coefficient (Spearman's rho); intraclass correlation coefficient (ICC); mean difference and limits of agreement, based on ARA sum scores; and weighted kappa, based on individual items.
Results: All intra- and interrater Spearman's rho and ICC values were higher than .98. The mean difference between ratings was highest for the interrater pair (.75; 95% confidence interval, .02-1.48), suggesting a small systematic difference between raters. Intrarater limits of agreement were -1.66 to 2.26; interrater limits of agreement were -2.35 to 3.85. Median weighted kappas exceeded .92.
Conclusion: The high intra- and interrater reliability of the ARA test was confirmed, as was its ability to detect a clinically relevant difference of 5.7 points.