This article explores various methodological issues of patient utility measurement in two randomized controlled clinical trials involving 85 patients with fibromyalgia and 144 with ankylosing spondylitis. In both trials one baseline and two follow-up measurements of the patients' preferences for their own health state and several hypothetical states were performed using the rating scale and the standard gamble methods. It was confirmed that standard gamble scores are consistently higher than rating scale scores for both the experienced and the hypothetical states. The 3-month test-retest reliability for hypothetical states measured by intraclass correlation coefficients ranged from 0.24 to 0.33 for the rating scale and from 0.43 to 0.70 for the standard gamble. Although the reproducibility is not high, the group mean scores are fairly stable over time. Mean standard gamble scores tend to differ depending on the way the measurements are undertaken. Utilities elicited with chained gambles were significantly higher than utilities elicited with basic reference gambles. At the individual level some inconsistent responses occurred. However, more than 70% of these fell within the bounds of the measurement error, which ranged from 0.11 to 0.13 on the standard gamble (0-1 scale) and from 8 to 10 on the rating scale (0-100 scale). The large number of negative utilities for the severe hypothetical state, which was used as an anchor point in the chained gambles, and the magnitude of these negative utilities (down to -19) calls for intensified research efforts to handle these responses in utility calculations.