Objectives: In fitting regression models, data analysts must often choose a model based on several candidate predictor variables which may influence the outcome. Most analysts either assume a linear relationship for continuous predictors, or categorize them and postulate step functions. By contrast, we propose to model possible non-linearity in the relationship between the outcome and several continuous predictors by estimating smooth functions of the predictors. We aim to demonstrate that a structured approach based on fractional polynomials can give a broadly satisfactory practical solution to the problem of simultaneously identifying a subset of 'important' predictors and determining the functional relationship for continuous predictors.
Methods: We discuss the background, and motivate and describe the multivariable fractional polynomial (MFP) approach to model selection from data which include continuous and categorical predictors. We compare our results with those from other approaches in examples. We present a small simulation study to compare the functional form of the relationship obtained by fitting fractional polynomials and splines to a single predictor variable.
Results: We illustrate the advantages of the MFP approach over standard techniques of model construction in two real example datasets analyzed with logistic and Cox regression models, respectively. In the simulation study, fractional polynomial models had lower mean square error and more realistic behaviour than comparable spline models.
Conclusions: In many practical situations, the MFP approach can satisfy the aim of finding models that fit the data well and also are simple, interpretable and potentially transportable to other settings.