Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul;91(7):662-674.
doi: 10.1002/cyto.a.23144. Epub 2017 Jun 13.

Statistical Performance of Image Cytometry for DNA, Lipids, Cytokeratin, & CD45 in a Model System for Circulation Tumor Cell Detection

Affiliations
Free PMC article

Statistical Performance of Image Cytometry for DNA, Lipids, Cytokeratin, & CD45 in a Model System for Circulation Tumor Cell Detection

Gregory L Futia et al. Cytometry A. .
Free PMC article

Abstract

Detection of circulating tumor cells (CTCs) in a blood sample is limited by the sensitivity and specificity of the biomarker panel used to identify CTCs over other blood cells. In this work, we present Bayesian theory that shows how test sensitivity and specificity set the rarity of cell that a test can detect. We perform our calculation of sensitivity and specificity on our image cytometry biomarker panel by testing on pure disease positive (D+ ) populations (MCF7 cells) and pure disease negative populations (D- ) (leukocytes). In this system, we performed multi-channel confocal fluorescence microscopy to image biomarkers of DNA, lipids, CD45, and Cytokeratin. Using custom software, we segmented our confocal images into regions of interest consisting of individual cells and computed the image metrics of total signal, second spatial moment, spatial frequency second moment, and the product of the spatial-spatial frequency moments. We present our analysis of these 16 features. The best performing of the 16 features produced an average separation of three standard deviations between D+ and D- and an average detectable rarity of ∼1 in 200. We performed multivariable regression and feature selection to combine multiple features for increased performance and showed an average separation of seven standard deviations between the D+ and D- populations making our average detectable rarity of ∼1 in 480. Histograms and receiver operating characteristics (ROC) curves for these features and regressions are presented. We conclude that simple regression analysis holds promise to further improve the separation of rare cells in cytometry applications. © 2017 International Society for Advancement of Cytometry.

Keywords: biomarkers; circulating tumor cells; false positive rate; image cytometry; image processing; lipids; receiver operating characteristics; sensitivity; spatial features; specificity.

Conflict of interest statement

The authors have no conflicts of interest to declare. The funders had no role in the study design, data collection, analysis, decision to publish, or preparation of this manuscript.

Figures

Figure 1
Figure 1
Histograms of image cytometry features computed on all 4 channels. Blue shows WBCs only samples composed of 24,699 objects (D), red shows MCF7 only samples (D+) composed of 41,091 objects used in the analysis. Black dashed line (MCF7 + WBC ~1:1 mixed samples containing 33,726 objects) is qualitative control reproducing modality of pure samples. dBct = 10*log10(counts). Box plots of distribution of cut off positions maximizing Ndet across the 48 training subsets are shown on top of each of the histograms.
Figure 2
Figure 2
AUC and Cohen’s d, performance characteristics of the features, are shown for the 48 training subsets. An operating point maximizing Ndet was found on each training subset (thresholds plotted in Fig. 1). The remaining data from that day was used as a testing subset to compute the shown sensitivity, specificity and the minimum detectable thresholds at the operating point. Occurrences of false positive rates of zero on the testing data were found and summed in the bottom panel.
Figure 3
Figure 3
ROC curves for the training subsets averaged over each day. Logarithmic (z-scored) sensitivity and specificity axis used shows straight lines when D+ and D distributions are Gaussian. Average values for sensitivity and specificity maximizing Ndet for are shown as + symbols are generally to the left of seen inflection points.
Figure 4
Figure 4
Histograms of testing subsets of WBCs (blue trace), MCF7 (red traces). The 1:1 mixed populations (black line) qualitative control showing regressed modality is real. Testing data was data not used to train the regressions and was naive to the regressions which were computed on the 48 training subsets. For each regression, an operating point on that training subset maximizing Ndet, and produced threshold positions shown as box plots on top of histograms. Regressions are z-scored thus center out near 0.
Figure 5
Figure 5
Performance of the regressions combinations over the 48 testing subsets shown as box plots. Above line performance statistics, AUC and Cohen’s d characterize the separation produced between the biomarkers. Below line performance statistics that depend on the operation point. Occurrences where a testing subset had a false positive rate of 0 were summed. For Reg1–4 below line performance statistics are similar while above line statistics are different.
Figure 6
Figure 6
Receiver operating characteristics of the regressions computed on the testing subsets averaged over each day. Average positions of operating points maximizing Ndet are shown as + symbols. As more features are included in the regressions performance and stability is seen to improve. Regressed distributions are not Gaussian as indicated by the shape on the z-scored sensitivity specificity axis.

Similar articles

See all similar articles

Publication types

Feedback