Background: EQ-5D-3L scoring algorithms vary amongst countries, not only in the values of regression coefficients but also in the independent variables included in the regression model (hereafter referred to as model specification). It is unclear how much of this variation is due to differences in health state selection, the relative frequencies with which health states were valued, and model diagnostics, rather than to genuine differences in population preferences.
Methods: Using aggregate data from a recent review, we noted all model specifications that were used. For each country the country's own model was re-fitted, as were all other model specifications. This was done twice: once using all valued health states for each country, and again using a common set of 17 health states for all countries. Goodness of fit was assessed using the following model diagnostics: mean absolute error (MAE), mean squared error (MSE) and rho (the Pearson correlation coefficient between predicted and observed mean utilities), both with and without leave-one-out cross-validation.
Results: Thirteen countries contributed data. Even when using a common set of health states, the preferred model varied across countries. However, choice of health states did impact the preferred model specification: when using cross-validation, the preferred specification changed in five of ten countries when moving from 17 health states to all valued health states. The relative frequency with which health states were valued had little impact on the preferred model.
Conclusions: Variation in choices of health states to value is responsible for some, but not all, of the observed heterogeneity in model specification. Relative frequency of health state valuation and choice of model diagnostic has a limited impact on model preference, however, use of cross-validation has a substantial impact. The use of cross-validation, implemented through omitting health states rather than respondents, is recommended as one approach to assessing model fit.
Keywords: EQ-5D; Health utility; Heterogeneity; Scoring algorithm.