Identification of influential observations in high-dimensional survival data through robust penalized Cox regression based on trimming

Math Biosci Eng. 2023 Jan 11;20(3):5352-5378. doi: 10.3934/mbe.2023248.


Penalized Cox regression can efficiently be used for the determination of biomarkers in high-dimensional genomic data related to disease prognosis. However, results of Penalized Cox regression is influenced by the heterogeneity of the samples who have different dependent structure between survival time and covariates from most individuals. These observations are called influential observations or outliers. A robust penalized Cox model (Reweighted Elastic Net-type maximum trimmed partial likelihood estimator, Rwt MTPL-EN) is proposed to improve the prediction accuracy and identify influential observations. A new algorithm AR-Cstep to solve Rwt MTPL-EN model is also proposed. This method has been validated by simulation study and application to glioma microarray expression data. When there were no outliers, the results of Rwt MTPL-EN were close to the Elastic Net (EN). When outliers existed, the results of EN were impacted by outliers. And whenever the censored rate was large or low, the robust Rwt MTPL-EN performed better than EN. and could resist the outliers in both predictors and response. In terms of outliers detection accuracy, Rwt MTPL-EN was much higher than EN. The outliers who "lived too long" made EN perform worse, but were accurately detected by Rwt MTPL-EN. Through the analysis of glioma gene expression data, most of the outliers identified by EN were those "failed too early", but most of them were not obvious outliers according to risk estimated from omics data or clinical variables. Most of the outliers identified by Rwt MTPL-EN were those who "lived too long", and most of them were obvious outliers according to risk estimated from omics data or clinical variables. Rwt MTPL-EN can be adopted to detect influential observations in high-dimensional survival data.

Keywords: heterogeneity; high-dimensional survival data; influential observations; omics data; outliers; penalized Cox regression; robust; trimming.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biomarkers / metabolism
  • Computer Simulation
  • Genomics*
  • Humans
  • Proportional Hazards Models


  • Biomarkers