A simple strategy for sample annotation error detection in cytometry datasets

Cytometry A. 2022 Apr;101(4):351-360. doi: 10.1002/cyto.a.24525. Epub 2021 Dec 29.


Mislabeling samples or data with the wrong participant information can affect study integrity and lead investigators to draw inaccurate conclusions. Quality control to prevent these types of errors is commonly embedded into the analysis of genomic datasets, but a similar identification strategy is not standard for cytometric data. Here, we present a method for detecting sample identification errors in cytometric data using expression of human leukocyte antigen (HLA) class I alleles. We measured HLA-A*02 and HLA-B*07 expression in three longitudinal samples from 41 participants using a 33-marker CyTOF panel designed to identify major immune cell types. 3/123 samples (2.4%) showed HLA allele expression that did not match their longitudinal pairs. Furthermore, these same three samples' cytometric signature did not match qPCR HLA class I allele data, suggesting that they were accurately identified as mismatches. We conclude that this technique is useful for detecting sample-labeling errors in cytometric analyses of longitudinal data. This technique could also be used in conjunction with another method, like GWAS or PCR, to detect errors in cross-sectional data. We suggest widespread adoption of this or similar techniques will improve the quality of clinical studies that utilize cytometry.

Keywords: cytometry; human leukocyte antigen; quality control; reproducible research; sample mix-up; sample swap.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Cross-Sectional Studies*
  • Humans
  • Real-Time Polymerase Chain Reaction