Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis
- PMID: 27591264
- DOI: 10.1093/ije/dyw145
Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis
Abstract
Background: The application of metabolomics in prospective cohort studies is statistically challenging. Given the importance of appropriate statistical methods for selection of disease-associated metabolites in highly correlated complex data, we combined random survival forest (RSF) with an automated backward elimination procedure that addresses such issues.
Methods: Our RSF approach was illustrated with data from the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam study, with concentrations of 127 serum metabolites as exposure variables and time to development of type 2 diabetes mellitus (T2D) as outcome variable. Out of this data set, Cox regression with a stepwise selection method was recently published. Replication of methodical comparison (RSF and Cox regression) was conducted in two independent cohorts. Finally, the R-code for implementing the metabolite selection procedure into the RSF-syntax is provided.
Results: The application of the RSF approach in EPIC-Potsdam resulted in the identification of 16 incident T2D-associated metabolites which slightly improved prediction of T2D when used in addition to traditional T2D risk factors and also when used together with classical biomarkers. The identified metabolites partly agreed with previous findings using Cox regression, though RSF selected a higher number of highly correlated metabolites.
Conclusions: The RSF method appeared to be a promising approach for identification of disease-associated variables in complex data with time to event as outcome. The demonstrated RSF approach provides comparable findings as the generally used Cox regression, but also addresses the problem of multicollinearity and is suitable for high-dimensional data.
Keywords: Cox proportional hazards regression; exploratory survival analysis; metabolomics; multicollinearity; random survival forest; right-censored data; type 2 diabetes mellitus; variable selection.
© The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association.
Similar articles
-
Identification of Serum Metabolites Associated With Incident Hypertension in the European Prospective Investigation into Cancer and Nutrition-Potsdam Study.Hypertension. 2016 Aug;68(2):471-7. doi: 10.1161/HYPERTENSIONAHA.116.07292. Epub 2016 May 31. Hypertension. 2016. PMID: 27245178
-
Plasma metabolomics identified novel metabolites associated with risk of type 2 diabetes in two prospective cohorts of Chinese adults.Int J Epidemiol. 2016 Oct;45(5):1507-1516. doi: 10.1093/ije/dyw221. Epub 2016 Sep 30. Int J Epidemiol. 2016. PMID: 27694567
-
Individual risk prediction: Comparing random forests with Cox proportional-hazards model by a simulation study.Biom J. 2023 Aug;65(6):e2100380. doi: 10.1002/bimj.202100380. Epub 2022 Sep 28. Biom J. 2023. PMID: 36169048
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
-
Metabolomics and Type 2 Diabetes: Translating Basic Research into Clinical Application.J Diabetes Res. 2016;2016:3898502. doi: 10.1155/2016/3898502. Epub 2015 Nov 9. J Diabetes Res. 2016. PMID: 26636104 Free PMC article. Review.
Cited by
-
Causal validation of the relationship between 35 blood and urine biomarkers and hyperthyroidism: a bidirectional Mendelian randomization study and meta-analysis.Front Endocrinol (Lausanne). 2024 Aug 12;15:1430798. doi: 10.3389/fendo.2024.1430798. eCollection 2024. Front Endocrinol (Lausanne). 2024. PMID: 39188917 Free PMC article.
-
Synthesis and quality assessment of combined time-series and static medical data using a real-world time-series generative adversarial network.Sci Rep. 2024 Aug 17;14(1):19064. doi: 10.1038/s41598-024-69812-7. Sci Rep. 2024. PMID: 39154144 Free PMC article.
-
Explainable machine learning predicts survival of retroperitoneal liposarcoma: A study based on the SEER database and external validation in China.Cancer Med. 2024 Jun;13(11):e7324. doi: 10.1002/cam4.7324. Cancer Med. 2024. PMID: 38847519 Free PMC article.
-
Machine learning versus regression for prediction of sporadic pancreatic cancer.Pancreatology. 2023 Jun;23(4):396-402. doi: 10.1016/j.pan.2023.04.009. Epub 2023 Apr 27. Pancreatology. 2023. PMID: 37130760 Free PMC article.
-
Prognostic value of a microRNA-pair signature in laryngeal squamous cell carcinoma patients.Eur Arch Otorhinolaryngol. 2022 Sep;279(9):4451-4460. doi: 10.1007/s00405-022-07404-9. Epub 2022 Apr 27. Eur Arch Otorhinolaryngol. 2022. PMID: 35478043
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
