Combining gene expression profiling and machine learning to diagnose B-cell non-Hodgkin lymphoma

Blood Cancer J. 2020 May 22;10(5):59. doi: 10.1038/s41408-020-0322-5.


Non-Hodgkin B-cell lymphomas (B-NHLs) are a highly heterogeneous group of mature B-cell malignancies. Their classification thus requires skillful evaluation by expert hematopathologists, but the risk of error remains higher in these tumors than in many other areas of pathology. To facilitate diagnosis, we have thus developed a gene expression assay able to discriminate the seven most frequent B-cell NHL categories. This assay relies on the combination of ligation-dependent RT-PCR and next-generation sequencing, and addresses the expression of more than 130 genetic markers. It was designed to retrieve the main gene expression signatures of B-NHL cells and their microenvironment. The classification is handled by a random forest algorithm which we trained and validated on a large cohort of more than 400 annotated cases of different histology. Its clinical relevance was verified through its capacity to prevent important misclassification in low grade lymphomas and to retrieve clinically important characteristics in high grade lymphomas including the cell-of-origin signatures and the MYC and BCL2 expression levels. This accurate pan-B-NHL predictor, which allows a systematic evaluation of numerous diagnostic and prognostic markers, could thus be proposed as a complement to conventional histology to guide the management of patients and facilitate their stratification into clinical trials.

Publication types

  • Clinical Trial
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / genetics
  • Diagnosis, Computer-Assisted
  • Gene Expression Profiling
  • Humans
  • Lymphoma, B-Cell / classification
  • Lymphoma, B-Cell / diagnosis*
  • Lymphoma, B-Cell / genetics
  • Machine Learning*
  • Progression-Free Survival
  • Transcriptome*
  • Tumor Microenvironment


  • Biomarkers, Tumor