In observational studies, many continuous or categorical covariates may be related to an outcome. Various spline-based procedures or the multivariable fractional polynomial (MFP) procedure can be used to identify important variables and functional forms for continuous covariates. This is the main aim of an explanatory model, as opposed to a model only for prediction. The type of analysis often guides the complexity of the final model. Spline-based procedures and MFP have tuning parameters for choosing the required complexity. To compare model selection approaches, we perform a simulation study in the linear regression context based on a data structure intended to reflect realistic biomedical data. We vary the sample size, variance explained and complexity parameters for model selection. We consider 15 variables. A sample size of 200 (1000) and R(2) = 0.2 (0.8) is the scenario with the smallest (largest) amount of information. For assessing performance, we consider prediction error, correct and incorrect inclusion of covariates, qualitative measures for judging selected functional forms and further novel criteria. From limited information, a suitable explanatory model cannot be obtained. Prediction performance from all types of models is similar. With a medium amount of information, MFP performs better than splines on several criteria. MFP better recovers simpler functions, whereas splines better recover more complex functions. For a large amount of information and no local structure, MFP and the spline procedures often select similar explanatory models.
Copyright © 2012 John Wiley & Sons, Ltd.