Integration of gene interaction information into a reweighted Lasso-Cox model for accurate survival prediction

Bioinformatics. 2021 Apr 1;36(22-23):5405-5414. doi: 10.1093/bioinformatics/btaa1046.

Abstract

Motivation: Accurately predicting the risk of cancer patients is a central challenge for clinical cancer research. For high-dimensional gene expression data, Cox proportional hazard model with the least absolute shrinkage and selection operator for variable selection (Lasso-Cox) is one of the most popular feature selection and risk prediction algorithms. However, the Lasso-Cox model treats all genes equally, ignoring the biological characteristics of the genes themselves. This often encounters the problem of poor prognostic performance on independent datasets.

Results: Here, we propose a Reweighted Lasso-Cox (RLasso-Cox) model to ameliorate this problem by integrating gene interaction information. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. We used random walk to evaluate the topological weight of genes, and then highlighted topologically important genes to improve the generalization ability of the RLasso-Cox model. Experiments on datasets of three cancer types showed that the RLasso-Cox model improves the prognostic accuracy and robustness compared with the Lasso-Cox model and several existing network-based methods. More importantly, the RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types.

Availability and implementation: http://bioconductor.org/packages/devel/bioc/html/RLassoCox.html.

Supplementary information: Supplementary data are available at Bioinformatics online.