Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;3(7):590-600.
doi: 10.1038/s42256-021-00342-x. Epub 2021 May 20.

Segmentation of Neurons from Fluorescence Calcium Recordings Beyond Real-time

Affiliations

Segmentation of Neurons from Fluorescence Calcium Recordings Beyond Real-time

Yijun Bao et al. Nat Mach Intell. 2021 Jul.

Abstract

Fluorescent genetically encoded calcium indicators and two-photon microscopy help understand brain function by generating large-scale in vivo recordings in multiple animal models. Automatic, fast, and accurate active neuron segmentation is critical when processing these videos. In this work, we developed and characterized a novel method, Shallow U-Net Neuron Segmentation (SUNS), to quickly and accurately segment active neurons from two-photon fluorescence imaging videos. We used temporal filtering and whitening schemes to extract temporal features associated with active neurons, and used a compact shallow U-Net to extract spatial features of neurons. Our method was both more accurate and an order of magnitude faster than state-of-the-art techniques when processing multiple datasets acquired by independent experimental groups; the difference in accuracy was enlarged when processing datasets containing few manually marked ground truths. We also developed an online version, potentially enabling real-time feedback neuroscience experiments.

PubMed Disclaimer

Conflict of interest statement

Competing Interests Statement The authors declare no competing interests.

Figures

Extended Data Fig. 1
Extended Data Fig. 1. The average calcium response formed the temporal filter kernel.
We determined the temporal matched filter kernel by averaging calcium transients within a moderate SNR range; these transients likely represent the temporal response to single action potentials. (a) Example data show all background-subtracted fluorescence calcium transients of all GT neurons in all videos in the ABO 275 μm dataset that showed peak SNR (pSNR) in the regime 6 < pSNR < 8 (gray). We minimized crosstalk from neighboring neurons by excluding transients during time periods when neighboring neurons also had transients. We normalized all transients such that their peak values were unity, and then averaged these normalized transients into an averaged spike trace (red). We used the portion of the average spike trace above e–1 (blue dashed line) as the final template kernel. (b) When analyzing performance on the ABO 275 μm dataset through 10-fold leave-one-out cross-validation, using the temporal kernel determined in (a) within our temporal filter scheme achieved significantly higher F1 score than not using a temporal filter or using an unmatched filter (*p < 0.05, **p < 0.005; two-sided Wilcoxon signed-rank test, n = 10 videos) and achieved a slightly higher F1 score than using a single exponentially decaying kernel (p = 0.77; two-sided Wilcoxon signed-rank test, n = 10 videos). Error bars are standard deviations. The gray dots represent scores for the test data for each round of cross-validation. The unmatched filter was a moving-average filter over 60 frames. (c-d) are analogous to (a-b), but for the Neurofinder dataset. We determined the filter kernel using videos 04.01 and 04.01.test.
Extended Data Fig. 2
Extended Data Fig. 2. The complexity of the CNN architecture controlled the tradeoff between speed and accuracy.
We explored multiple potential CNN architectures to optimize performance. (a-d) Various CNN architectures having depths of (a) two, (b) three, (c) four, or (d) five. For the three-depth architecture, we also tested different numbers of skip connections, ReLU (Rectified Linear Unit) instead of ELU (Exponential Linear Unit) as the activation function, and separable Conv2D instead of Conv2D in the encoding path. The dense five-depth model mimicked the model used in UNet2Ds. The legend “0/ni+ni” represents whether the skip connection was used (ni+ni) or not used (0+ni). (e) The F1 score and processing speed of SUNS using various CNN architectures when analyzing the ABO 275 μm dataset through 10-fold leave-one-out cross-validation. The right panel zooms in on the rectangular region in the left panel. Error bars are standard deviations. The legend (n1, n2, …, nk) describes architectures with k-depth and ni channels at the ith depth. We determined that the three-depth model, (4,8,16), using one skip connection at the shallowest layer, ELU, and full Conv2D (Fig. 1c), had a good trade-off between speed and accuracy; we used this architecture as the SUNS architecture throughout the paper. One important drawback of the ReLU activation function was its occasional (20% of the time) failure during training, compared to negligible failure levels for the ELU activation function.
Extended Data Fig. 3
Extended Data Fig. 3. The F1 score of SUNS was robust to moderate variation of training and post-processing parameters.
We tested if the accuracy of SUNS when analyzing the ABO 275 μm dataset within the 10-fold leave-one-out cross-validation relied on intricate tuning of the algorithm’s hyperparameters. The evaluated training parameters included (a) the threshold of the SNR video (thSNR) and (b) the training batch size. The evaluated post-processing parameters included (c) the threshold of probability map (thprob), (d) the minimum neuron area (tharea), (e) the threshold of COM distance (thCOM), and (f) the minimum number of consecutive frames (thframe). The solid blue lines are the average F1 scores, and the shaded regions are mean ± one standard deviation. When evaluating the post-processing parameters in (c-f), we fixed each parameter under investigation at the given values and simultaneously optimized the F1 score over the other parameters. Variations in these hyperparameters produced only small variations in the F1 performance. The orange lines show the F1 score (solid) ± one standard deviation (dashed) when we optimized all four post-processing parameters simultaneously. The similarity between the F1 scores on the blue lines and the scores on the orange lines suggest that optimizing for three or four parameters simultaneously achieved similar optimized performance. Moreover, the relatively consistent F1 scores on the blue lines suggest that our algorithm did not rely on intricate hyperparameter tuning.
Extended Data Fig. 4
Extended Data Fig. 4. The performance of SUNS was better than that of other methods in the presence of intensity noise or motion artifacts.
The (a, d) recall, (b, e) precision, and (c, f) F1 score of all the (a-c) batch and (d-f) online segmentation algorithms in the presence of increasing intensity noise. The test dataset was the ABO 275 μm data with added random noise. The relative noise strength was represented by the ratio of the standard deviation of the random noise amplitude to the mean fluorescence intensity. As expected, the F1 scores of all methods decreased as the noise amplitude grew. The F1 of SUNS was greater than the F1’s of all other methods at all noise intensities. (g-l) are in the same format of (a-f), but show the performance with the presence of increasing motion artifacts. The motion artifacts strength was represented by the standard deviation of the random movement amplitude (unit: pixels). As expected, the F1 scores of all methods decreased as the motion artifacts became stronger. The F1 of SUNS was greater than the F1’s of all other methods at all motion amplitudes. STNeuroNet and CaImAn batch were the most sensitive to strong motion artifacts, likely because they rely on accurate 3D spatiotemporal structures of the video. On the contrary, SUNS relied more on the 2D spatial structure, so it retained the accuracy better when spatial structures changed position over different frames.
Extended Data Fig. 5
Extended Data Fig. 5. SUNS accurately mapped the spatial extent of each neuron even if the spatial footprints of neighboring cells overlapped.
SUNS segmented active neurons within each individual frame, and then accurately collected and merged the instances belonging to the same neurons. We selected two example pairs of overlapping neurons from the ABO video 539670003 identified by SUNS, and showed their traces and instances when they were activated independently. (a) The SNR images of the region surrounding the selected neurons. The left image is the maximum projection of the SNR video over the entire recording time, which shows the two neurons were active and overlapping. The right images are single-frame SNR images at two different time points, each at the peak of a fluorescence transient where only one of the two neurons was active. The segmentation of each neuron generated by SUNS is shown as a contour with a different color. The scale bar is 3 μm. (b) The temporal SNR traces of the selected neurons, matched to the colors of their contours in (a). Because the pairs of neurons overlapped, their fluorescence traces displayed substantial crosstalk. The dash markers above each trace show the active periods of each neuron determined by SUNS. The colored triangles below each trace indicate the manually-selected time of the single-frame images shown in (a). (c-d) are parallel to (a-b), but for a different overlapping neuron pair. (e) We quantified the ability to find overlapped neurons for each segmentation algorithm using the recall score. We divided the ground truth neurons in all the ABO videos into two groups: neurons without and with overlap with other neurons. We then computed the recall scores for both groups. The recall of SUNS on spatially overlapping neurons was not significantly lower (and was numerically higher) than the recall of SUNS on non-spatially overlapping neurons (p > 0.8, one-sided Wilcoxon rank-sum test, n = 10 videos; n.s.l. – not significantly lower). Therefore, the performance of SUNS on overlapped neurons was at least equally good as the performance of SUNS on non-overlapped neurons. Moreover, the recall scores of SUNS in both groups were comparable to or significantly higher than that of other methods in those groups (**p < 0.005, n.s. – not significant; two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The gray dots represent the scores on the test data for each round of cross-validation.
Extended Data Fig. 6
Extended Data Fig. 6. Each pre-processing step and the CNN contributed to the accuracy of SUNS at the cost of lower speed.
We evaluated the contribution of each pre-processing option (spatial filtering, temporal filtering, and SNR normalization) and the CNN option to SUNS. The reference algorithm (SUNS) used all options except spatial filtering. We compared the performance of this reference algorithm to the performance with additional spatial filtering (optional SF), without temporal filtering (no TF), without SNR normalization (no SNR), and without the CNN (no CNN) when analyzing the ABO 275 μm dataset through 10-fold leave-one-out cross-validation. (a) The recall, precision, and F1 score of these variants. The temporal filtering, SNR normalization, and CNN each significantly contributed to the overall accuracy, but the impact of spatial filtering was not significant (*p < 0.05, **p < 0.005, n.s. - not significant; two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The gray dots represent the scores on the test data for each round of cross-validation. (b) The speed and F1 score of these variants. Eliminating temporal filtering or the CNN significantly increased the speed, while adding spatial filtering or eliminating SNR normalization significantly lowered the speed (**p < 0.005; two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The light color dots represent F1 scores and speeds for the test data for each round of cross-validation. The execution of SNR normalization was fast (~0.07 ms/frame). However, eliminating SNR normalization led to a much lower optimal thprob, and thus increased the number of active pixels and decreased precision. In addition, “no SNR” had lower speed than the complete SUNS algorithm due to the increased post-processing computation workload for managing the additional active pixels and regions.
Extended Data Fig. 7
Extended Data Fig. 7. The recall, precision, and F1 score of SUNS were superior to that of other methods on a variety of datasets.
(a) Training on one ABO 275 μm video and testing on nine ABO 275 μm videos (each data point is the average over each set of nine test videos, n = 10); (b) Training on ten ABO 275 μm videos and testing on ten ABO 175 μm videos (n = 10); (c) Training on one Neurofinder video and testing on one paired Neurofinder video (n = 12); (d) Training on three-quarters of one CaImAn video and testing on the remaining quarter of the same CaImAn video (n = 16). The F1 scores of SUNS were mostly significantly higher than the F1 scores of other methods (*p < 0.05, **p < 0.005, ***p < 0.001, n.s. - not significant; two-sided Wilcoxon signed-rank test; error bars are standard deviations). The gray dots represent the individual scores for each round of cross-validation.
Extended Data Fig. 8
Extended Data Fig. 8. SUNS online outperformed CaImAn Online in accuracy and speed when processing a variety of datasets.
(a, e) Training on one ABO 275 μm video and testing on nine ABO 275 μm videos (each data point is the average over each set of nine test videos, n = 10); (b, f) Training on ten ABO 275 μm videos and testing on ten ABO 175 μm videos (n = 10); (c, g) Training on one Neurofinder video and testing on one paired Neurofinder video (n = 12); (d, h) Training on three-quarters of one CaImAn video and testing on the remaining quarter of the same CaImAn video (n = 16). The F1 score and processing speed of SUNS online were significantly higher than the F1 score and speed of CaImAn Online (**p < 0.005, ***p < 0.001; two-sided Wilcoxon signed-rank test; error bars are standard deviations). The gray dots in (a-d) represent individual scores for each round of cross-validation. The light color dots in (e-g) represent F1 scores and speeds for the test data for each round of cross-validation. The light color markers in (h) represent F1 scores and speeds for the test data for each round of cross-validation performed on different CaImAn videos. We updated the baseline and noise regularly after initialization for the Neurofinder dataset, but did not do so for other datasets.
Extended Data Fig. 9
Extended Data Fig. 9. Changing the frequency of updating the neuron masks modulated trade-offs between SUNS online’s response time to new neurons and SUNS online’s performance metrics.
The (a-c) F1 score and (d-f) speed of SUNS online increased as the number of frames per update (nmerge) increased for the (a, d) ABO 275 μm, (b, e) Neurofinder, and (c, f) CaImAn datasets. The solid line is the average, and the shading is one standard deviation from the average (n = 10, 12, and 16 cross-validation iterations for the three datasets). In (a-c), the green lines show the F1 score (solid) ± one standard deviation (dashed) of SUNS batch. The F1 score and speed generally increased as nmerge increased. For example, the F1 score and speed when using nmerge = 500 were higher than the F1 score and speed when using nmerge = 20, and a half of the differences were significant (*p < 0.05, **p < 0.005, ***p < 0.001, n.s. - not significant; two-sided Wilcoxon signed-rank test; n = 10, 12, and 16, respectively). We updated the baseline and noise regularly after initialization for the Neurofinder dataset, but did not do so for other datasets. The nmerge was inversely proportional to the update frequency or the responsiveness of SUNS online to the appearance of new neurons. A trade-off exists between this responsiveness and the accuracy and speed of SUNS online. At the cost of less responsiveness, a higher nmerge allowed the accumulation of temporal information and improved the accuracy of neuron segmentations. Likewise, a higher nmerge improved the speed because it reduced the occurrence of computations for aggregating neurons.
Extended Data Fig. 10
Extended Data Fig. 10. Updating the baseline and noise after initialization increased the accuracy of SUNS online at the cost of lower speed.
We compared the F1 score and speed of SUNS online with or without baseline and noise update for the (a) ABO 275 μm, (b) Neurofinder, and (c) CaImAn datasets. The F1 scores with baseline and noise update were generally higher, but the speeds were slower (*p < 0.05, **p < 0.005, ***p < 0.001, n.s. - not significant; two-sided Wilcoxon signed-rank test; error bars are standard deviations). The light color dots represent F1 scores and speeds for the test data for each round of cross-validation. The improvement in the F1 score was larger as the baseline fluctuation becomes more significant. (d) Example processing time per frame of SUNS online with baseline and noise update on Neurofinder video 02.00. The lower inset zooms in on the data from the red box. The upper inset is the distribution of processing time per frame. The processing time per frame was consistently faster than the microscope recording rate (125 ms/frame). The first few frames after initialization were faster than the following frames, because the baseline and noise update was not performed in these frames.
Figure 1:
Figure 1:. Schematic for the proposed fast neuron segmentation algorithm based on a shallow U-Net.
(a) The overall structure of SUNS included (b) pre-processing, (c) CNN inference, and (d) post-processing. The input is a raw registered video, and the output is a set of binary masks representing segmented neurons. (b) The pre-processing procedure used optional spatial filtering to remove large-scale background fluctuations, temporal filtering to enhance calcium transients, and SNR normalization to remove inactive neurons. The output of the pre-processing was an SNR video used for CNN inference. (c) Our CNN employed a shallow U-Net. Example dimensions of each feature map are at the left of each row. The numbers of channels in each feature map are on top of each feature. The arrows denote different local tensor operations. The output of the CNN was a probability map used in post-processing. (d) The post-processing procedure screened for active pixels, spatially grouped active pixels in each frame into active connected regions, and temporally merged active regions belonging to the same neuron across all frames.
Figure 2:
Figure 2:. SUNS outperformed existing neuron segmentation algorithms in accuracy and speed on the ABO dataset.
(a) Example segmentations from a part of ABO video 524691284 for SUNS, STNeuroNet, CaImAn Batch, and Suite2p, overlaid on top of the imaging data. The grayscale image is the projection of the maximum pixel-wise SNR (Scale bar: 20 μm). The light orange outlines denote the GT neurons, and the other colors denote the neurons found by the algorithms. (b) Example neurons zoomed from the boxed regions in (a) that were identified correctly by SUNS but missed by the other methods. The images are the average SNR images around the peaks of one calcium transient from each of the pictured neurons (Scale bar: 3 μm; Supplementary Fig. 1). (c) The recall, precision, and F1 score of SUNS during a 10-round cross-validation were superior to that of the other methods and the independent human Grader 3 (**p < 0.005, two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The gray dots represent scores for the test data for each round of cross-validation. (d) SUNS required less training time than the other methods over both a single cross-validation round and a 10-round average. (e) In addition to superior detection accuracy, SUNS had faster processing speed than the other methods and the video rate (**p < 0.005; two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The light color dots represent F1 scores and speeds for the test data for each round of cross-validation.
Figure 3:
Figure 3:. SUNS outperformed existing neuron segmentation algorithms in accuracy and speed when processing a variety of datasets.
(a-d) The F1 score and processing speed of SUNS were better than STNeuroNet, CaImAn Batch, and Suite2p on four tests of generalization: (a) training on one ABO 275 μm video and testing on nine ABO 275 μm videos (each data point is the average over each set of nine test videos, n = 10); (b) training on ten ABO 275 μm videos and testing on ten ABO 175 μm videos (n = 10); (c) training on one Neurofinder video and testing on one paired Neurofinder video (n = 12); (d) training on three-quarters of one CaImAn video and testing on the remaining quarter of the same CaImAn video (n = 16) (for F1 score and processing speed, *p < 0.05, **p < 0.005, ***p < 0.001, n.s. - not significant; two-sided Wilcoxon signed-rank test; error bars are standard deviations). The light color dots in (a-c) represent F1 scores and speeds for the test data for each round of cross-validation. The light color markers in (d) represent F1 scores and speeds for the test data for each round of cross-validation performed on different CaImAn videos. (e) Example segmentations from the fourth quadrant of the CaImAn video J123 compare the results of all four segmentation methods, overlaid on top of the imaging data. The grayscale image is the projection of the maximum pixel-wise SNR (Scale bar: 20 μm). The light orange outlines denote the GT neurons, and the other colors denote the neurons found by the algorithms. (f) The example neuron zoomed from the boxed region in (e) that was identified correctly by SUNS but missed by the other methods. The image was the averaged SNR images around the peak of a calcium transient of the neuron (Scale bar: 3 μm; Supplementary Fig. 1).
Figure 4:
Figure 4:. SUNS online outperformed CaImAn Online in accuracy and speed on the ABO dataset.
(a) Example segmentations from ABO video 524691284 compare the results of SUNS online to the results of CaImAn Online, overlaid on top of the imaging data. The grayscale image is the projection of the maximum pixel-wise SNR (Scale bar: 20 μm). The light orange outlines denote the GT neurons, and the other colors denote the neurons found by the algorithms. (b) Example neurons zoomed from the boxed regions in (a), the same region as in Fig. 2b. The images were the average SNR images around the peaks of one calcium transient from each of the pictured neurons (Scale bar: 3 μm; Supplementary Fig. 1). (c) The F1 scores of SUNS online during a 10-round cross-validation were superior to the F1 scores of CaImAn Online and the independent human Grader 3, and was close to SUNS batch (**p < 0.005, n.s. - not significant; two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The gray dots represent the scores on the test data for each round of cross-validation. (d) In addition to superior detection accuracy, SUNS online was also faster than CaImAn Online, although slower than SUNS batch (**p < 0.005, two-sided Wilcoxon signed-rank test, n = 10 videos; error bars are standard deviations). The light color dots represent the F1 scores and speeds for the test data for each round of cross-validation. (e) Example processing time per frame when applying SUNS online on video 501574836. The black dashed line is the microscope recording rate. The lower inset zooms in on the data from the red box. The upper inset is the distribution of processing time per frame.

Similar articles

Cited by

References

    1. Akerboom J, et al., Genetically encoded calcium indicators for multi-color neural activity imaging and combination with optogenetics. Frontiers in Molecular Neuroscience (2013). - PMC - PubMed
    1. Chen T-W, et al., Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature. 499 (7458), 295–300, (2013). - PMC - PubMed
    1. Dana H, et al., High-performance calcium sensors for imaging activity in neuronal populations and microcompartments. Nature Methods. 16 (7), 649–657, (2019). - PubMed
    1. Helmchen F and Denk W, Deep tissue two-photon microscopy. Nature Methods. 2 (12), 932–940, (2005). - PubMed
    1. Stringer C, et al., Spontaneous behaviors drive multidimensional, brainwide activity. Science. 364 (6437), eaav7893, (2019). - PMC - PubMed

Online References

    1. Oppenheim A, Schafer R, and Stockham T, Nonlinear filtering of multiplied and convolved signals. IEEE Transactions on Audio and Electroacoustics. 16 (3), 437–466, (1968).
    1. Szymanska AF, et al., Accurate detection of low signal-to-noise ratio neuronal calcium transient waves using a matched filter. Journal of Neuroscience Methods. 259 1–12, (2016). - PubMed
    1. Milletari F, Navab N, and Ahmadi S V-Net: Fully convolutional neural networks for volumetric medical image segmentation. in 2016 Fourth International Conference on 3D Vision (3DV). 2016.
    1. Lin T-Y, et al.Focal loss for dense object detection. in Proceedings of the IEEE international conference on computer vision. 2017.
    1. Allen Brain Observatory. Available from: http://observatory.brain-map.org/visualcoding.