Multiobjective semisupervised learning with a right-censored endpoint adapted to the multiple imputation framework

Biom J. 2022 Dec;64(8):1446-1466. doi: 10.1002/bimj.202000365. Epub 2021 Jun 27.

Abstract

Semisupervised learning aims to use additional knowledge in the search for data structure. In clinical applications, including predictive information in the construction of a data-driven classification is of major importance. This work was motivated by a study that aimed to identify different patterns of immune parameters that would be associated with relapse-free survival in a cohort of breast cancer patients. Supervised and unsupervised objectives can be concomitantly optimized using multiobjective optimization. We propose such a procedure that addresses two challenges in the semisupervised approach, that is, missing data and additional knowledge based on survival time. The former was handled by using multiple imputation and consensus clustering. Survival information was incorporated in the supervised objective through the estimation of a cross-validation error of a Cox regression. A simulation study was performed to assess the performance of the proposed procedure. On complete datasets, the performances were compared to those of an existing modified multiobjective semisupervised learning method. The added value of including the survival data in the learning process was assessed by comparing the procedure to unsupervised learning. The proposed procedure showed better performance than the existing method, notably in the selection of the number of clusters. On incomplete datasets, the procedure showed little sensitivity to most of its parameters, even though a high number of imputations and partition initialization seeds improved the performance. The performance was degraded with a high proportion of missing data (40%) and with more ambiguous data structures. Simulation results and application on real data support the conclusion that our procedure enables the construction of a classification associated with a right-censored endpoint on a possibly incomplete dataset.

Keywords: consensus; multiobjective optimization; multiple imputation; semisupervised learning; survival endpoint.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computer Simulation
  • Humans
  • Neoplasm Recurrence, Local*