Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 30;20(1):263.
doi: 10.1186/s12915-022-01453-6.

Synthetic Micrographs of Bacteria (SyMBac) allows accurate segmentation of bacterial cells using deep neural networks

Affiliations

Synthetic Micrographs of Bacteria (SyMBac) allows accurate segmentation of bacterial cells using deep neural networks

Georgeos Hardo et al. BMC Biol. .

Abstract

Background: Deep-learning-based image segmentation models are required for accurate processing of high-throughput timelapse imaging data of bacterial cells. However, the performance of any such model strictly depends on the quality and quantity of training data, which is difficult to generate for bacterial cell images. Here, we present a novel method of bacterial image segmentation using machine learning models trained with Synthetic Micrographs of Bacteria (SyMBac).

Results: We have developed SyMBac, a tool that allows for rapid, automatic creation of arbitrary amounts of training data, combining detailed models of cell growth, physical interactions, and microscope optics to create synthetic images which closely resemble real micrographs, and is capable of training accurate image segmentation models. The major advantages of our approach are as follows: (1) synthetic training data can be generated virtually instantly and on demand; (2) these synthetic images are accompanied by perfect ground truth positions of cells, meaning no data curation is required; (3) different biological conditions, imaging platforms, and imaging modalities can be rapidly simulated, meaning any change in one's experimental setup no longer requires the laborious process of manually generating new training data for each change. Deep-learning models trained with SyMBac data are capable of analysing data from various imaging platforms and are robust to drastic changes in cell size and morphology. Our benchmarking results demonstrate that models trained on SyMBac data generate more accurate cell identifications and precise cell masks than those trained on human-annotated data, because the model learns the true position of the cell irrespective of imaging artefacts. We illustrate the approach by analysing the growth and size regulation of bacterial cells during entry and exit from dormancy, which revealed novel insights about the physiological dynamics of cells under various growth conditions.

Conclusions: The SyMBac approach will help to adapt and improve the performance of deep-learning-based image segmentation models for accurate processing of high-throughput timelapse image data.

Keywords: Bacterial cell imaging; Cell segmentation; Deep-learning; High-throughput imaging; Image analysis; Microfluidics; Synthetic images; Timelapse microscopy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The synthetic image generation process: a Schematic of linear colonies of cells in a microfluidic device known colloquially as the mother machine. b Synthetic image generation pipeline: rigid body physics simulations are combined with agent-based modelling to simulate bacterial growth in the device. These simulations are convolved with the microscope’s point spread function, which is generated using known parameters of the objective lens. This output image is then further optimised to match real images. Scale bar = 1 μm. c Synthetic data can be adapted to different biological conditions, variations in microfluidic designs, and imaging modalities. With real data, many experiments would need to be conducted to generate training data with the same kind of coverage. Scale bar = 1 μm. d Typical timescales for individual steps in the generation of training data. e Humans annotating images had variable performances and consistently undersegmented cells, especially in small stationary phase cells. f SyMBac is approximately 10,000× faster than a human at generating training data (10,000 images in less than 10 min)
Fig. 2
Fig. 2
Different synthetic image modalities: a Synthetic data can be generated for microfluidic devices that produce monolayer colonies, in this case the microfluidic turbedostat described in [28] (real image courtesy of Elf Lab, Uppsala University). Scale bar = 2 μm b SyMBac can also generate timelapse image data for the growth of monolayer colonies on agar pads. Scale bar = 2 μm
Fig. 3
Fig. 3
Model training, evaluation, and timing benchmarks: a Schematic of the U-net model being trained using synthetic data and then segmenting real data to produce accurate masks. b SyMBac can retrain generalised models, such as Omnipose (a derivative of Cellpose, allowing for mask reconstruction from arbitrary morphologies). Because Omnipose was not trained on any microfluidic device images, it fails to properly segment the image, attaching masks to the mother machine trench geometry (though it admirably segments cells within the trench). Retraining Omnipose with SyMBac’s synthetic data results in near perfect segmentation, with no more trench artefacts. c A typical time to train the network, either Omnipose or DeLTA (on 2000 images) and segment approximately one million images (Nvidia GeForce 1080Ti)
Fig. 4
Fig. 4
Model quality and segmentation precision: a The masks from SyMBac-trained models are truer to the geometry of the cells, displaying no aberration when compared to model outputs trained on human-annotated data. b The masks also maintain a narrow distribution of widths, while the masks from DeLTA trained with human-annotated data display a wide variation with the peak shifted to lower values and show 2.5× higher variation in cell width. c Examples of the type of data which can be segmented using a single SyMBac-trained model. In this case, we show the performance of a single DeLTA model trained on combination data across 3 different cell sizes. Scale bar = 2 μm d The SyMBac-trained model produces masks with precisions of 40 nm for length and 19 nm for width. This is calculated by fitting a line to the length and width trace of cells in the stationary phase
Fig. 5
Fig. 5
a Models trained with SyMBac were used to segment single-cell data throughout all growth curve regimes (colour-coded and used throughout the figure). b Example kympgraphs of 100× data showing cells in a variety of states (exponential growth, stationary phase, filamenting) with accompanying masks, highlighting the robustness of the model trained on mixed data to segment cells of multiple cell sizes and morphologies. c An example output showing the coordinate system applied to a cell mask, generated by Colicoords [16], allowing for highly accurate length and width prediction. d Example time series of the size of a single cell going through an entire growth curve. The inset shows cell length changes during the stationary phase. ei During the exponential phase, cells exponentially increase their size with a mean growth rate of 2.6 volume doublings per hour, which is equivalent to a population doubling time of 23 min, consistent with the bulk growth measurements of cells in this richly defined medium [3]. The distribution of growth rates shifts to the left as cells enter the stationary phase (orange and green phase) and eventually stops 6 h into the stationary phase (pink). For all growth rates, corresponding standard deviations are also reported. j Cells show a wide distribution of lengths during the exponential phase which narrows greatly during entry to the stationary phase, as cells are “locked in” to their width. Interestingly, while the mean width decreases in the stationary phase, the variability in cell widths increases. k Example of a cell exiting stationary phase, showing the increase in length and width. l Comparison of initial length and the added length before the first division after exit from stationary phase shows that cells are noisy as sizers towards length regulation. m Comparison of the initial width and the added width before the division shows that E. coli is an almost perfect width-sizer, dividing only when individual cells reach a critical width
Fig. 6
Fig. 6
Extensions of SyMBac for cell segmentation in images of linear colonies of B. subtilis (very straight cells, unlike E. coli which have more curvature), monolayer colonies in a 2D microfluidic turbidostat chamber (data from [28], kindly provided by the Elf Lab, Uppsala University), growing colonies on agar pad, and low-resolution fluorescence snapshots of dense cell clusters on agar pad. Scale bar = 2 μm
Fig. 7
Fig. 7
Block diagram of the image generation pipeline. The cell spherocylinder image is first morphed using the roll function and multiplied by Ic, the empirical cell intensity. To this image is added the trench OPL image, which is multiplied by It, the empirical trench intensity. Finally, the media image is added with those pixels being multiplied by Im, the empirical media intensity. These steps are described in detail in Additional file 1, Section 3. The PSF, which has been altered by Gaussian apodisation, and simulated defocus is then convolved over the image. The precise implementation of the PSF and its modifications are described in Additional file 1, Section 4. The camera-modelled shot noise and read noise are then added to the image (implementation details in Additional file 1, Section 6 [39], with optional Fourier and intensity matching occurring after (Additional file 1, Section 7, [40]). Combined, this produces a synthetic image realistic enough to train highly accurate models for image segmentation of real data

Similar articles

Cited by

References

    1. Robert L, Ollion J, Robert J, Song X, Matic I, Elez M. Mutation dynamics and fitness effects followed in single cells. Science. 2018;359(6381):1283–1286. doi: 10.1126/science.aan0797. - DOI - PubMed
    1. Lord ND, Norman TM, Yuan R, Bakshi S, Losick R, Paulsson J. Stochastic antagonism between two proteins governs a bacterial cell fate switch. Science. 2019;366(6461):116–120. doi: 10.1126/science.aaw4506. - DOI - PMC - PubMed
    1. Bakshi S, Leoncini E, Baker C, Cañas-Duarte SJ, Okumus B, Paulsson J. Tracking bacterial lineages in complex and dynamic environments with applications for growth control and persistence. Nat Microbiol. 2021;6(6):783–791. doi: 10.1038/s41564-021-00900-4. - DOI - PMC - PubMed
    1. Luro S, Potvin-Trottier L, Okumus B, Paulsson J. Isolating live cells after high-throughput, long-term, time-lapse microscopy. Nat Methods. 2019;17(1):93–100. doi: 10.1038/s41592-019-0620-7. - DOI - PMC - PubMed
    1. Niederholtmeyer H, Sun ZZ, Hori Y, Yeung E, Verpoorte A, Murray RM, et al. Rapid cell-free forward engineering of novel genetic ring oscillators. eLife. 2015;4. - PMC - PubMed

Publication types

LinkOut - more resources