Background: In multivariate distributions (for example, in 3- or more color flow cytometric datasets), it can become difficult or impossible to identify populations that differ between samples based only on a combination of univariate or bivariate displays. Indeed, it is possible that such differences can only be identified in "n"-dimensional space, where "n" is the number of parameters measured. Therefore, computer assisted identification of such differences is necessary. Such a method could be used to identify responses (i.e., by comparing cell samples before and after stimulation) in exquisite detail by allowing complete analysis of the collected data on only those events which have responded.
Methods: Multivariate Probability Binning can be used to compare different datasets to identify the distance and statistical significance of a difference between the distributions. An intermediate step in the algorithm provides access to the actual locations within the n-dimensional comparison which are most different between the distributions. Gates based on collections of hyper-rectangular bins can then be applied to datasets, thereby selecting those events (or clusters of events) that are different between samples. We term this process Frequency Difference Gating.
Results: Frequency Difference Gating was used in several test scenarios to evaluate its utility. First, we compared PBMC subsets identified by solely by immunofluorescence staining: based on this training data set, the algorithm automatically generated an accurate forward and side-scatter gate to identify lymphocytes. Second, we applied the algorithm to identify subtle differences between CD4 memory subsets based on 8-color immunophenotyping data. The resulting 3-dimensional gate could resolve cells subsets much more frequent in one subset compared to the other; no combination of two-dimensional gates could accomplish this resolution. Finally, we used the algorithm to compare B cell populations derived from mice of different ages or strains, and found that the algorithm could find very subtle differences between the populations.
Conclusion: Frequency Difference Gating is a powerful tool that automates the process of identifying events comprising underlying differences between samples. It is not a clustering tool; it is not meant to identify subsets in multidimensional space. Importantly, this method may reveal subtle changes in small populations of cells, changes that only occur simultaneously in multiple dimensions in such a way that identification by univariate or bivariate analyses is impossible. Finally, the method may significantly aid in the analysis of high-order multivariate data (i.e., 6-12 color flow cytometric analyses), where identification of differences between datasets becomes so time-consuming as to be impractical. Published 2001 Wiley-Liss, Inc.