Parametric modelling of cost data in medical studies

Stat Med. 2004 Apr 30;23(8):1311-31. doi: 10.1002/sim.1744.


The cost of medical resources used is often recorded for each patient in clinical studies in order to inform decision-making. Although cost data are generally skewed to the right, interest is in making inferences about the population mean cost. Common methods for non-normal data, such as data transformation, assuming asymptotic normality of the sample mean or non-parametric bootstrapping, are not ideal. This paper describes possible parametric models for analysing cost data. Four example data sets are considered, which have different sample sizes and degrees of skewness. Normal, gamma, log-normal, and log-logistic distributions are fitted, together with three-parameter versions of the latter three distributions. Maximum likelihood estimates of the population mean are found; confidence intervals are derived by a parametric BC(a) bootstrap and checked by MCMC methods. Differences between model fits and inferences are explored.Skewed parametric distributions fit cost data better than the normal distribution, and should in principle be preferred for estimating the population mean cost. However for some data sets, we find that models that fit badly can give similar inferences to those that fit well. Conversely, particularly when sample sizes are not large, different parametric models that fit the data equally well can lead to substantially different inferences. We conclude that inferences are sensitive to choice of statistical model, which itself can remain uncertain unless there is enough data to model the tail of the distribution accurately. Investigating the sensitivity of conclusions to choice of model should thus be an essential component of analysing cost data in practice.

Publication types

  • Comparative Study

MeSH terms

  • Clinical Trials as Topic / methods*
  • Clinical Trials as Topic / statistics & numerical data
  • Confidence Intervals
  • Cost-Benefit Analysis / methods*
  • Cost-Benefit Analysis / statistics & numerical data
  • Data Interpretation, Statistical*
  • Health Care Costs / statistics & numerical data
  • Humans
  • Markov Chains
  • Mental Disorders / economics
  • Mental Disorders / therapy
  • Models, Econometric*
  • Monte Carlo Method
  • Normal Distribution
  • Randomized Controlled Trials as Topic