Methods for categorizing a prognostic variable in a multivariable setting

Stat Med. 2003 Feb 28;22(4):559-71. doi: 10.1002/sim.1333.


The literature is filled with examples of categorization of a continuous prognostic variable in a univariable setting followed by the addition of this categorical variable to an existing multivariable model. Typically, an "optimal" cutpoint for a new prognostic variable is obtained through a systematic search relating the variable to the outcome in an univariable manner. The corresponding categorical variable is then fitted in a multivariable model along with other already established prognostic covariates to assess the additional value of the new variable. This prompts the question whether the cutpoint search should have been performed in the same multivariable setting where it will ultimately be used. In this paper, we extend the univariable cutpoint search methods (split-sample approach and two-fold cross-validation approach) to the multivariable setting using -2 x log-likelihood statistic as the correlative measure. A Monte Carlo simulation study demonstrates that both methods are more efficient in detecting the true cutpoint and in estimating the effect size under the multivariable setting as opposed to the univariable setting. The cross-validation method performs better than the split-sample method in univariable as well as multivariable scenarios. For the cross-validation method in the multivariable setting, there is still a substantial loss of power when a cutpoint model is used in cases where there is a continuous relationship between the covariate and the outcome. An example is provided to illustrate the value of the multivariable cross-validation approach.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Humans
  • Monte Carlo Method
  • Multivariate Analysis*
  • Prognosis*
  • Proportional Hazards Models*
  • Reproducibility of Results
  • Survival Analysis
  • United States