A robust and interpretable end-to-end deep learning model for cytometry data

Proc Natl Acad Sci U S A. 2020 Sep 1;117(35):21373-21380. doi: 10.1073/pnas.2003026117. Epub 2020 Aug 14.


Cytometry technologies are essential tools for immunology research, providing high-throughput measurements of the immune cells at the single-cell level. Existing approaches in interpreting and using cytometry measurements include manual or automated gating to identify cell subsets from the cytometry data, providing highly intuitive results but may lead to significant information loss, in that additional details in measured or correlated cell signals might be missed. In this study, we propose and test a deep convolutional neural network for analyzing cytometry data in an end-to-end fashion, allowing a direct association between raw cytometry data and the clinical outcome of interest. Using nine large cytometry by time-of-flight mass spectrometry or mass cytometry (CyTOF) studies from the open-access ImmPort database, we demonstrated that the deep convolutional neural network model can accurately diagnose the latent cytomegalovirus (CMV) in healthy individuals, even when using highly heterogeneous data from different studies. In addition, we developed a permutation-based method for interpreting the deep convolutional neural network model. We were able to identify a CD27- CD94+ CD8+ T cell population significantly associated with latent CMV infection, confirming the findings in previous studies. Finally, we provide a tutorial for creating, training, and interpreting the tailored deep learning model for cytometry data using Keras and TensorFlow (https://github.com/hzc363/DeepLearningCyTOF).

Keywords: CyTOF; cytomegalovirus; deep learning; flow cytometry; model interpretation.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Cytomegalovirus Infections / diagnosis
  • Deep Learning*
  • Flow Cytometry*
  • Humans
  • T-Lymphocytes / cytology