PCM-SABRE: a platform for benchmarking and comparing outcome prediction methods in precision cancer medicine

BMC Bioinformatics. 2017 Jan 17;18(1):40. doi: 10.1186/s12859-016-1435-5.

Abstract

Background: Numerous publications attempt to predict cancer survival outcome from gene expression data using machine-learning methods. A direct comparison of these works is challenging for the following reasons: (1) inconsistent measures used to evaluate the performance of different models, and (2) incomplete specification of critical stages in the process of knowledge discovery. There is a need for a platform that would allow researchers to replicate previous works and to test the impact of changes in the knowledge discovery process on the accuracy of the induced models.

Results: We developed the PCM-SABRE platform, which supports the entire knowledge discovery process for cancer outcome analysis. PCM-SABRE was developed using KNIME. By using PCM-SABRE to reproduce the results of previously published works on breast cancer survival, we define a baseline for evaluating future attempts to predict cancer outcome with machine learning. We used PCM-SABRE to replicate previous work that describe predictive models of breast cancer recurrence, and tested the performance of all possible combinations of feature selection methods and data mining algorithms that was used in either of the works. We reconstructed the work of Chou et al. observing similar trends - superior performance of Probabilistic Neural Network (PNN) and logistic regression (LR) algorithms and inconclusive impact of feature pre-selection with the decision tree algorithm on subsequent analysis.

Conclusions: PCM-SABRE is a software tool that provides an intuitive environment for rapid development of predictive models in cancer precision medicine.

Keywords: Breast cancer; Data mining; Reproducible research.

MeSH terms

  • Algorithms
  • Benchmarking
  • Breast Neoplasms / prevention & control*
  • Female
  • Humans
  • Logistic Models
  • Machine Learning*
  • Neoplasm Recurrence, Local / prevention & control*
  • Neural Networks, Computer*
  • Precision Medicine / methods*
  • Software