Multiple regression of cost data: use of generalised linear models

J Health Serv Res Policy. 2004 Oct;9(4):197-204. doi: 10.1258/1355819042250249.


Objective: Choosing an appropriate method for regression analyses of cost data is problematic because it must focus on population means while taking into account the typically skewed distribution of the data. In this paper we illustrate the use of generalised linear models for regression analysis of cost data.

Methods: We consider generalised linear models with either an identity link function (providing additive covariate effects) or log link function (providing multiplicative effects), and with gaussian (normal), overdispersed poisson, gamma, or inverse gaussian distributions. These are applied to estimate the treatment effects in two randomised trials adjusted for baseline covariates. Criteria for choosing an appropriate model are presented.

Results: In both examples considered, the gaussian model fits poorly and other distributions are to be preferred. When there are variables of prognostic importance in the model, using different distributions can materially affect the estimates obtained; it may also be possible to discriminate between additive and multiplicative covariate effects.

Conclusions: Generalised linear models are attractive for the regression of cost data because they provide parametric methods of analysis where a variety of non-normal distributions can be specified and the way covariates act can be altered. Unlike the use of data transformation in ordinary least-squares regression, generalised linear models make inferences about the mean cost directly.

MeSH terms

  • Costs and Cost Analysis / methods*
  • Health Services Research
  • Linear Models
  • Normal Distribution
  • United Kingdom