HiFreSP: A novel high-frequency sub-pathway mining approach to identify robust prognostic gene signatures

Brief Bioinform. 2020 Jul 15;21(4):1411-1424. doi: 10.1093/bib/bbz078.


With the increasing awareness of heterogeneity in cancers, better prediction of cancer prognosis is much needed for more personalized treatment. Recently, extensive efforts have been made to explore the variations in gene expression for better prognosis. However, the prognostic gene signatures predicted by most existing methods have little robustness among different datasets of the same cancer. To improve the robustness of the gene signatures, we propose a novel high-frequency sub-pathways mining approach (HiFreSP), integrating a randomization strategy with gene interaction pathways. We identified a six-gene signature (CCND1, CSF3R, E2F2, JUP, RARA and TCF7) in esophageal squamous cell carcinoma (ESCC) by HiFreSP. This signature displayed a strong ability to predict the clinical outcome of ESCC patients in two independent datasets (log-rank test, P = 0.0045 and 0.0087). To further show the predictive performance of HiFreSP, we applied it to two other cancers: pancreatic adenocarcinoma and breast cancer. The identified signatures show high predictive power in all testing datasets of the two cancers. Furthermore, compared with the two popular prognosis signature predicting methods, the least absolute shrinkage and selection operator penalized Cox proportional hazards model and the random survival forest, HiFreSP showed better predictive accuracy and generalization across all testing datasets of the above three cancers. Lastly, we applied HiFreSP to 8137 patients involving 20 cancer types in the TCGA database and found high-frequency prognosis-associated pathways in many cancers. Taken together, HiFreSP shows higher prognostic capability and greater robustness, and the identified signatures provide clinical guidance for cancer prognosis. HiFreSP is freely available via GitHub: https://github.com/chunquanlipathway/HiFreSP.

Keywords: RNA-Seq; bootstrap training sets; cancer prognosis; pathway.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Gene Expression Profiling*
  • Humans
  • Neoplasms / genetics*
  • Prognosis