Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions

Yupeng Cun; Holger Fröhlichholger Fröhlich

doi:10.1186/1471-2105-13-69

Prognostic gene signatures for patient stratification in breast cancer: accuracy, stability and interpretability of gene selection approaches using prior knowledge on protein-protein interactions

BMC Bioinformatics. 2012 May 1:13:69. doi: 10.1186/1471-2105-13-69.

Authors

Yupeng Cun¹, Holger Fröhlichholger Fröhlich

Affiliation

¹ Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, Dahlmannstraße, Bonn, Germany.

Abstract

Background: Stratification of patients according to their clinical prognosis is a desirable goal in cancer treatment in order to achieve a better personalized medicine. Reliable predictions on the basis of gene signatures could support medical doctors on selecting the right therapeutic strategy. However, during the last years the low reproducibility of many published gene signatures has been criticized. It has been suggested that incorporation of network or pathway information into prognostic biomarker discovery could improve prediction performance. In the meanwhile a large number of different approaches have been suggested for the same purpose.

Methods: We found that on average incorporation of pathway information or protein interaction data did not significantly enhance prediction performance, but indeed greatly interpretability of gene signatures. Some methods (specifically network-based SVMs) could greatly enhance gene selection stability, but revealed only a comparably low prediction accuracy, whereas Reweighted Recursive Feature Elimination (RRFE) and average pathway expression led to very clearly interpretable signatures. In addition, average pathway expression, together with elastic net SVMs, showed the highest prediction performance here.

Results: The results indicated that no single algorithm to perform best with respect to all three categories in our study. Incorporating network of prior knowledge into gene selection methods in general did not significantly improve classification accuracy, but greatly interpretability of gene signatures compared to classical algorithms.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Algorithms*
Biomarkers / analysis*
Breast Neoplasms / diagnosis
Breast Neoplasms / genetics*
Female
Forecasting
Gene Expression Profiling / methods*
Genes, Neoplasm
Humans
Prognosis
Protein Interaction Mapping
Reproducibility of Results
Support Vector Machine

Substances

Biomarkers