Objectives: The assessment of statistical significance of survivorship differences of model-predicted groups is an important step in survivorship studies. Some models determined to be significant using current methodologies are assumed to have predictive capabilities. These methods compare parameters from predicted classes, not random samples from homogenous populations, and they may be insensitive to prediction errors. Type I-like errors can result wherein models with high prediction error rates are accepted. We have developed and evaluated an alternate statistic for determining the significance of survivorship between or among model-derived survivorship classes.
Methods: We propose and evaluate a new statistical test, the F* test, which incorporates parameters that reflect prediction errors that are unobserved by the current methods of evaluation.
Results: We found that the Log Rank test identified fewer failed models than the F* test. When both the tests were significant, we found a more accurate model. Using two prediction models applied to eight datasets, we found that the F* test gave a correct inference five out of eight times, whereas the Log Rank test only identified one model out of the eight correctly.
Conclusion: Our empirical evaluation reveals that the hypothesis testing inferences derived using the F* test exhibit better parity with the accuracy of prediction models than the other options. The generalizable prediction accuracy of prediction models should be of paramount importance for model-based survivorship prediction studies.