Classifying ten types of major cancers based on reverse phase protein array profiles

PLoS One. 2015 Mar 30;10(3):e0123147. doi: 10.1371/journal.pone.0123147. eCollection 2015.

Abstract

Gathering vast data sets of cancer genomes requires more efficient and autonomous procedures to classify cancer types and to discover a few essential genes to distinguish different cancers. Because protein expression is more stable than gene expression, we chose reverse phase protein array (RPPA) data, a powerful and robust antibody-based high-throughput approach for targeted proteomics, to perform our research. In this study, we proposed a computational framework to classify the patient samples into ten major cancer types based on the RPPA data using the SMO (Sequential minimal optimization) method. A careful feature selection procedure was employed to select 23 important proteins from the total of 187 proteins by mRMR (minimum Redundancy Maximum Relevance Feature Selection) and IFS (Incremental Feature Selection) on the training set. By using the 23 proteins, we successfully classified the ten cancer types with an MCC (Matthews Correlation Coefficient) of 0.904 on the training set, evaluated by 10-fold cross-validation, and an MCC of 0.936 on an independent test set. Further analysis of these 23 proteins was performed. Most of these proteins can present the hallmarks of cancer; Chk2, for example, plays an important role in the proliferation of cancer cells. Our analysis of these 23 proteins lends credence to the importance of these genes as indicators of cancer classification. We also believe our methods and findings may shed light on the discoveries of specific biomarkers of different types of cancers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Humans
  • Neoplasms / metabolism*
  • Protein Array Analysis / methods
  • Proteins / metabolism*
  • Proteome / metabolism

Substances

  • Proteins
  • Proteome

Grants and funding

This work was supported by grants from National Basic Research Program of China (2011CB510102, 2011CB510101), and National Natural Science Foundation of China (31371335, 81171342, 81201148, 61401302), Innovation Program of Shanghai Municipal Education Commission (12ZZ087), the grant of “The First-class Discipline of Universities in Shanghai”, Tianjin Research Program of Application Foundation and Advanced Technology(14JCQNJC09500), the National Research Foundation for the Doctoral Program of Higher Education of China (20130032120070, 20120032120073) and the Independent Innovation Foundation of Tianjin University (60302064, 60302069). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.