Evaluating Random Forests for Survival Analysis using Prediction Error Curves
- PMID: 25317082
- PMCID: PMC4194196
- DOI: 10.18637/jss.v050.i11
Evaluating Random Forests for Survival Analysis using Prediction Error Curves
Abstract
Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.
Keywords: R.; Survival prediction; prediction error curves; random survival forest.
Figures
Similar articles
-
Survival prediction models: an introduction to discrete-time modeling.BMC Med Res Methodol. 2022 Jul 26;22(1):207. doi: 10.1186/s12874-022-01679-6. BMC Med Res Methodol. 2022. PMID: 35883032 Free PMC article.
-
A Comparison of Random Forest Variable Selection Methods for Classification Prediction Modeling.Expert Syst Appl. 2019 Nov 15;134:93-101. doi: 10.1016/j.eswa.2019.05.028. Epub 2019 May 23. Expert Syst Appl. 2019. PMID: 32968335 Free PMC article.
-
Random forest methodology for model-based recursive partitioning: the mobForest package for R.BMC Bioinformatics. 2013 Apr 11;14:125. doi: 10.1186/1471-2105-14-125. BMC Bioinformatics. 2013. PMID: 23577585 Free PMC article.
-
[A review of models of forest fire occurrence prediction in China].Ying Yong Sheng Tai Xue Bao. 2020 Sep 15;31(9):3227-3240. doi: 10.13287/j.1001-9332.202009.014. Ying Yong Sheng Tai Xue Bao. 2020. PMID: 33345524 Review. Chinese.
-
A review on longitudinal data analysis with random forest.Brief Bioinform. 2023 Mar 19;24(2):bbad002. doi: 10.1093/bib/bbad002. Brief Bioinform. 2023. PMID: 36653905 Free PMC article. Review.
Cited by
-
Lifestyle, Epstein-Barr virus infection, and other factors could impede nasopharyngeal cancer survivorship: a five-year cross-sectional study in North Eastern India.Virusdisease. 2022 Dec;33(4):371-382. doi: 10.1007/s13337-022-00789-5. Epub 2022 Nov 8. Virusdisease. 2022. PMID: 36447816 Free PMC article.
-
GPU Accelerated Estimation of a Shared Random Effect Joint Model for Dynamic Prediction.Comput Stat Data Anal. 2022 Oct;174:107528. doi: 10.1016/j.csda.2022.107528. Epub 2022 May 16. Comput Stat Data Anal. 2022. PMID: 39257897 Free PMC article.
-
Research on molecular characteristics of ADME-related genes in kidney renal clear cell carcinoma.Sci Rep. 2024 Jul 22;14(1):16834. doi: 10.1038/s41598-024-67516-6. Sci Rep. 2024. PMID: 39039118 Free PMC article.
-
Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study.BMJ Open. 2023 Mar 1;13(3):e068045. doi: 10.1136/bmjopen-2022-068045. BMJ Open. 2023. PMID: 36858471 Free PMC article.
-
Radiomic signature of 18F fluorodeoxyglucose PET/CT for prediction of gastric cancer survival and chemotherapeutic benefits.Theranostics. 2018 Nov 12;8(21):5915-5928. doi: 10.7150/thno.28018. eCollection 2018. Theranostics. 2018. PMID: 30613271 Free PMC article.
References
-
- Adler W, Lausen B. Bootstrap estimated true and false positive rates and ROC curve. Computational Statistics & Data Analysis. 2009;53(3):718–729.
-
- Andersen M, Andersen K, Kammersgaard L, Olsen T. Sex differences in stroke survival: 10-year follow-up of the Copenhagen Stroke Study cohort. Journal of Stroke and Cerebrovascular Diseases. 2005;14(5):215–220. - PubMed
-
- Andersen PK, Borgan Ø , Gill RD, Keiding N. Springer Series in Statistics. Springer-Verlag; New York: 1993. Statistical Models Based on Counting Processes.
-
- Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology. 2004;57(11):1138–46. - PubMed
-
- Binder H, Schumacher M. Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Statistical applications in genetics and molecular biology. 2008;7(1):12. - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials