The objective of this study was to analyze the problem of interpreting change scores of ordinal health status measures for clinical research or practice. Methods used included exploration of the generation of change scores in the physical ability scale of the SF-36, one of the most widely used generic health status instruments. Resulting data are presented as the ranking of items according to baseline score; a percentage of patients with severe difficulty and Rasch analysis provided the same rank order of item difficulty. On the interval scale provided by the Rasch model a concentration of items reflecting moderate difficulty occurred. This "inflates" numerical gains for patients with moderate disability compared to patients with very severe or minor physical disability. Calibration of change scores using patient perception of the level of change in function showed important variation of numerical gains with baseline. We conclude that numerically equal gains may differ in their meaning depending on baseline health status. It is recommended that distribution of baseline health status measures and distribution of responders by baseline status be reported in evaluative studies.