OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer

PLoS Comput Biol. 2023 Mar 10;19(3):e1010333. doi: 10.1371/journal.pcbi.1010333. eCollection 2023 Mar.


In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L0-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.

MeSH terms

  • Algorithms*
  • Gene Expression Profiling
  • Humans
  • Male
  • Prognosis
  • Prostatic Neoplasms* / genetics

Grant support

This study was supported by University of Turku Graduate School (MATTI), Academy of Finland (grants no. 310507, 313267 and 326238), Cancer Society of Finland, the Sigrid Jusélius Foundation to ASH, Academy of Finland (grants no. 319274 and 310507) and University of Turku to KJ, the Cancer Foundation Finland (grant no. 180132), Hospital District of Helsinki and Uusimaa (grants TYH2018214 and TYH2019235), and the Academy of Finland (grant no. 304667) to TM. The study was also supported by the Academy of Finland (grants no. 310507, 313267, 340141, 344698 and 345803), Helse Sør-Øst (2020026), Radium Hospital Foundation, Cancer Foundation Finland, the Sigrid Jusélius Foundation, and the European Union’s Horizon 2020 Research and Innovation Programme (ERA PerMed CLL-CLUE project) to TA, and the Finnish Cancer Institute (FICAN Cancer Researcher) and Finnish Cultural Foundation to TDL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.