Background: Development of prognostic models enables identification of variables that are influential in predicting patient outcome and the use of these multiple risk factors in a systematic, reproducible way according to evidence based methods. The reliability of models depends on informed use of statistical methods, in combination with prior knowledge of disease. We reviewed published articles to assess reporting and methods used to develop new prognostic models in cancer.
Methods: We developed a systematic search string and identified articles from PubMed. Forty-seven articles were included that satisfied the following inclusion criteria: published in 2005; aiming to predict patient outcome; presenting new prognostic models in cancer with outcome time to an event and including a combination of at least two separate variables; and analysing data using multivariable analysis suitable for time to event data.
Results: In 47 studies, prospective cohort or randomised controlled trial data were used for model development in only 33% (15) of studies. In 30% (14) of the studies insufficient data were available, having fewer than 10 events per variable (EPV) used in model development. EPV could not be calculated in a further 40% (19) of the studies. The coding of candidate variables was only reported in 68% (32) of the studies. Although use of continuous variables was reported in all studies, only one article reported using recommended methods of retaining all these variables as continuous without categorisation. Statistical methods for selection of variables in the multivariate modelling were often flawed. A method that is not recommended, namely, using statistical significance in univariate analysis as a pre-screening test to select variables for inclusion in the multivariate model, was applied in 48% (21) of the studies.
Conclusions: We found that published prognostic models are often characterised by both use of inappropriate methods for development of multivariable models and poor reporting. In addition, models are limited by the lack of studies based on prospective data of sufficient sample size to avoid overfitting. The use of poor methods compromises the reliability of prognostic models developed to provide objective probability estimates to complement clinical intuition of the physician and guidelines.