Detecting change in individual patients is an important goal of neuropsychological testing. However, limited information is available about test-retest changes, and well-validated prediction methods are lacking. Using a large nonclinical subject group (N = 384), we recently investigated test-retest reliabilities and practice effects on the Wechsler Adult Intelligence Scale and Halstead-Reitan Battery. Data from this group also were used to develop models for predicting follow-up test scores and establish confidence intervals around them. In this article we review those findings, examine their generalizability to new nonclinical and clinical groups, and explore the sensitivity of the prediction models to real change. Despite similarities across samples in reliability coefficients and practice effects, limits to the generalizability of prediction methods were found. Also, when multiple test measures were considered together, one or more "significant" changes were common in all (including stable) subject groups. By employing normative cut-offs that correct for this, sensitivity of the models to neurological recovery and deterioration was modest to good. More complex regression models were not more accurate than the simpler Reliable Change Index with correction for practice effects when confidence intervals for all methods were adjusted for variations in level of baseline test performance.