Background: Two most important considerations in evaluation of survival prediction models are 1) predictability - ability to predict survival risks accurately and 2) reproducibility - ability to generalize to predict samples generated from different studies. We present approaches for assessment of reproducibility of survival risk score predictions across medical centers.
Methods: Reproducibility was evaluated in terms of consistency and transferability. Consistency is the agreement of risk scores predicted between two centers. Transferability from one center to another center is the agreement of the risk scores of the second center predicted by each of the two centers. The transferability can be: 1) model transferability - whether a predictive model developed from one center can be applied to predict the samples generated from other centers and 2) signature transferability - whether signature markers of a predictive model developed from one center can be applied to predict the samples from other centers. We considered eight prediction models, including two clinical models, two gene expression models, and their combinations. Predictive performance of the eight models was evaluated by several common measures. Correlation coefficients between predicted risk scores of different centers were computed to assess reproducibility - consistency and transferability.
Results: Two public datasets, the lung cancer data generated from four medical centers and colon cancer data generated from two medical centers, were analyzed. The risk score estimates for lung cancer patients predicted by three of four centers agree reasonably well. In general, a good prediction model showed better cross-center consistency and transferability. The risk scores for the colon cancer patients from one (Moffitt) medical center that were predicted by the clinical models developed from the another (Vanderbilt) medical center were shown to have excellent model transferability and signature transferability.
Conclusions: This study illustrates an analytical approach to assessing reproducibility of predictive models and signatures. Based on the analyses of the two cancer datasets, we conclude that the models with clinical variables appear to perform reasonable well with high degree of consistency and transferability. There should have more investigations on the reproducibility of prediction models including gene expression data across studies.