Before being introduced to wide use, health status instruments should be evaluated for reliability and validity. Increasingly, they are also tested for responsiveness to important clinical changes. Although standards exist for assessing these properties, confusion and inconsistency arise because multiple statistics are used for the same property; controversy exists over how to measure responsiveness; many statistics are unavailable on common software programs; strategies for measuring these properties vary; and it is often unclear how to define a clinically important change in patient status. Using data from a clinical trial of therapy for back pain, we demonstrate the calculation of several statistics for measuring reproducibility and responsiveness, and demonstrate relationships among them. Simple computational guides for several statistics are provided. We conclude that reproducibility should generally be quantified with the intraclass correlation coefficient rather than the more common Pearson r. Assessing reproducibility by retest at one-to-two week intervals (rather than a shorter interval) may result in more realistic estimates of the variability to be observed among control subjects in a longitudinal study. Instrument responsiveness should be quantified using indicators of effect size, a modified effect size statistic proposed by Guyatt, or the use of receiver operating characteristic (ROC) curves to describe how well various score changes can distinguish improved from unimproved patients.