Random survival forests with multivariate longitudinal endogenous covariates

Stat Methods Med Res. 2023 Dec;32(12):2331-2346. doi: 10.1177/09622802231206477. Epub 2023 Oct 27.

Abstract

Predicting the individual risk of clinical events using the complete patient history is a major challenge in personalized medicine. Analytical methods have to account for a possibly large number of time-dependent predictors, which are often characterized by irregular and error-prone measurements, and are truncated early by the event. In this work, we extended the competing-risk random survival forests to handle such endogenous longitudinal predictors when predicting event probabilities. The method, implemented in the R package DynForest, internally transforms the time-dependent predictors at each node of each tree into time-fixed features (using mixed models) that can then be used as splitting candidates. The final individual event probability is computed as the average of leaf-specific Aalen-Johansen estimators over the trees. Using simulations, we compared the performances of DynForest to accurately predict an event with (i) a joint modeling alternative when considering two longitudinal predictors only, and with (ii) a regression calibration method that ignores the informative truncation by the event when dealing with a large number of longitudinal predictors. Through an application in dementia research, we also illustrated how DynForest can be used to develop a dynamic prediction tool for dementia from multimodal repeated markers, and quantify the importance of each marker.

Keywords: Individual dynamic prediction; competing risks; longitudinal data; multivariate predictors; random survival forest; survival data.

MeSH terms

  • Dementia*
  • Humans
  • Models, Statistical*
  • Probability
  • Regression Analysis
  • Survival Analysis