Machine Learning-based Classification of Diffuse Large B-cell Lymphoma Patients by Their Protein Expression Profiles

Mol Cell Proteomics. 2015 Nov;14(11):2947-60. doi: 10.1074/mcp.M115.050245. Epub 2015 Aug 26.


Characterization of tumors at the molecular level has improved our knowledge of cancer causation and progression. Proteomic analysis of their signaling pathways promises to enhance our understanding of cancer aberrations at the functional level, but this requires accurate and robust tools. Here, we develop a state of the art quantitative mass spectrometric pipeline to characterize formalin-fixed paraffin-embedded tissues of patients with closely related subtypes of diffuse large B-cell lymphoma. We combined a super-SILAC approach with label-free quantification (hybrid LFQ) to address situations where the protein is absent in the super-SILAC standard but present in the patient samples. Shotgun proteomic analysis on a quadrupole Orbitrap quantified almost 9,000 tumor proteins in 20 patients. The quantitative accuracy of our approach allowed the segregation of diffuse large B-cell lymphoma patients according to their cell of origin using both their global protein expression patterns and the 55-protein signature obtained previously from patient-derived cell lines (Deeb, S. J., D'Souza, R. C., Cox, J., Schmidt-Supprian, M., and Mann, M. (2012) Mol. Cell. Proteomics 11, 77-89). Expression levels of individual segregation-driving proteins as well as categories such as extracellular matrix proteins behaved consistently with known trends between the subtypes. We used machine learning (support vector machines) to extract candidate proteins with the highest segregating power. A panel of four proteins (PALD1, MME, TNFAIP8, and TBC1D4) is predicted to classify patients with low error rates. Highly ranked proteins from the support vector analysis revealed differential expression of core signaling molecules between the subtypes, elucidating aspects of their pathobiology.

MeSH terms

  • Apoptosis Regulatory Proteins / genetics
  • Apoptosis Regulatory Proteins / metabolism
  • Biomarkers, Tumor / genetics*
  • Biomarkers, Tumor / metabolism
  • Cell Line, Tumor
  • Formaldehyde
  • GTPase-Activating Proteins / genetics
  • GTPase-Activating Proteins / metabolism
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Isotope Labeling / methods
  • Lymphoma, Large B-Cell, Diffuse / diagnosis
  • Lymphoma, Large B-Cell, Diffuse / genetics*
  • Lymphoma, Large B-Cell, Diffuse / metabolism
  • Lymphoma, Large B-Cell, Diffuse / pathology
  • Machine Learning*
  • Neoplasm Proteins / genetics*
  • Neoplasm Proteins / metabolism
  • Neprilysin / genetics
  • Neprilysin / metabolism
  • Phosphoprotein Phosphatases / genetics
  • Phosphoprotein Phosphatases / metabolism
  • Principal Component Analysis
  • Proteome / genetics*
  • Proteome / metabolism
  • Proteomics / methods
  • Signal Transduction
  • Tissue Embedding
  • Tissue Fixation


  • Apoptosis Regulatory Proteins
  • Biomarkers, Tumor
  • GTPase-Activating Proteins
  • Neoplasm Proteins
  • Proteome
  • TBC1D4 protein, human
  • TNFAIP8 protein, human
  • Formaldehyde
  • PALD1 protein, human
  • Phosphoprotein Phosphatases
  • Neprilysin