Feature-guided clustering of multi-dimensional flow cytometry datasets

Qing T Zeng; Juan Pablo Pratt; Jane Pak; Dino Ravnic; Harold Huss; Steven J Mentzer

doi:10.1016/j.jbi.2006.06.005

Feature-guided clustering of multi-dimensional flow cytometry datasets

J Biomed Inform. 2007 Jun;40(3):325-31. doi: 10.1016/j.jbi.2006.06.005. Epub 2006 Jun 27.

Authors

Qing T Zeng¹, Juan Pablo Pratt, Jane Pak, Dino Ravnic, Harold Huss, Steven J Mentzer

Affiliation

¹ Decision Systems Group, Brigham and Women's Hospital, 310 Thorn Building, 75 Francis Street, Harvard Medical School, Boston, MA 02115, USA. qzeng@dsg.harvard.edu

PMID: 16901761
DOI: 10.1016/j.jbi.2006.06.005

Abstract

Background: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering "objects" (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities.

Results: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial "cells"). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100% of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2% of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p<.0001).

Conclusion: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.

MeSH terms

Algorithms
Animals
Antibodies / chemistry
Artificial Intelligence
Cluster Analysis*
Computational Biology / methods*
Flow Cytometry / methods*
Fuzzy Logic
Humans
Microspheres*
Models, Statistical
Neural Networks, Computer
Pattern Recognition, Automated
Programming Languages
Regression Analysis

Substances

Antibodies