immunoClust--An automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets

Cytometry A. 2015 Jul;87(7):603-15. doi: 10.1002/cyto.a.22626. Epub 2015 Apr 7.


Multiparametric fluorescence and mass cytometry offers new perspectives to disclose and to monitor the high diversity of cell populations in the peripheral blood for biomarker research. While high-end cytometric devices are currently available to detect theoretically up to 120 individual parameters at the single cell level, software tools are needed to analyze these complex datasets automatically in acceptable time and without operator bias or knowledge. We developed an automated analysis pipeline, immunoClust, for uncompensated fluorescence and mass cytometry data, which consists of two parts. First, cell events of each sample are grouped into individual clusters. Subsequently, a classification algorithm assorts these cell event clusters into populations comparable between different samples. The clustering of cell events is designed for datasets with large event counts in high dimensions as a global unsupervised method, sensitive to identify rare cell types even when next to large populations. Both parts use model-based clustering with an iterative expectation maximization algorithm and the integrated classification likelihood to obtain the clusters. A detailed description of both algorithms is presented. Testing and validation was performed using 1) blood cell samples of defined composition that were depleted of particular cell subsets by magnetic cell sorting, 2) datasets of the FlowCAP III challenges to identify populations of rare cell types and 3) high-dimensional fluorescence and mass-cytometry datasets for comparison with conventional manual gating procedures. In conclusion, the immunoClust-algorithm is a promising tool to standardize and automate the analysis of high-dimensional cytometric datasets. As a prerequisite for interpretation of such data, it will support our efforts in developing immunological biomarkers for chronic inflammatory disorders and therapy recommendations in personalized medicine. immunoClust is implemented as an R-package and is provided as source code from

Keywords: Key terms: automated multivariate clustering; iterative model-based clustering; probability based metaclustering; rare population detection.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Blood Cells / cytology*
  • Cluster Analysis
  • Computational Biology / methods*
  • Electronic Data Processing / methods*
  • Flow Cytometry / methods*
  • Humans
  • Multivariate Analysis