Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep;50(11):1-23.
doi: 10.18637/jss.v050.i11.

Evaluating Random Forests for Survival Analysis using Prediction Error Curves

Affiliations

Evaluating Random Forests for Survival Analysis using Prediction Error Curves

Ulla B Mogensen et al. J Stat Softw. 2012 Sep.

Abstract

Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.

Keywords: R.; Survival prediction; prediction error curves; random survival forest.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Predicted survival curves for newData[1,] left panel, newData[2,] middle panel, and newData[3,] right panel. Both random forest approaches used 1000 trees.
Figure 2
Figure 2
The bootstrap .632+ estimates of the prediction error based on 1000 bootstrap samples. Both random forest approaches are based on 1000 trees per bootstrap sample.
Figure 3
Figure 3
The bootstrap cross-validation estimates of the prediction error based on 1000 bootstrap samples. Both random forest approaches are based on 1000 trees per bootstrap sample.
Figure 4
Figure 4
Each of the three panels shows four different estimates of the prediction error together with a cloud of 100 bootstrap cross-validation curves (grey lines). Both random forest approaches are based on 1000 trees per bootstrap sample.

Similar articles

Cited by

References

    1. Adler W, Lausen B. Bootstrap estimated true and false positive rates and ROC curve. Computational Statistics & Data Analysis. 2009;53(3):718–729.
    1. Andersen M, Andersen K, Kammersgaard L, Olsen T. Sex differences in stroke survival: 10-year follow-up of the Copenhagen Stroke Study cohort. Journal of Stroke and Cerebrovascular Diseases. 2005;14(5):215–220. - PubMed
    1. Andersen PK, Borgan Ø , Gill RD, Keiding N. Springer Series in Statistics. Springer-Verlag; New York: 1993. Statistical Models Based on Counting Processes.
    1. Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology. 2004;57(11):1138–46. - PubMed
    1. Binder H, Schumacher M. Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Statistical applications in genetics and molecular biology. 2008;7(1):12. - PubMed

LinkOut - more resources