We present single-cell clustering using bifurcation analysis (SCUBA), a novel computational method for extracting lineage relationships from single-cell gene expression data and modeling the dynamic changes associated with cell differentiation. SCUBA draws techniques from nonlinear dynamics and stochastic differential equation theories, providing a systematic framework for modeling complex processes involving multilineage specifications. By applying SCUBA to analyze two complementary, publicly available datasets we successfully reconstructed the cellular hierarchy during early development of mouse embryos, modeled the dynamic changes in gene expression patterns, and predicted the effects of perturbing key transcriptional regulators on inducing lineage biases. The results were robust with respect to experimental platform differences between RT-PCR and RNA sequencing. We selectively tested our predictions in Nanog mutants and found good agreement between SCUBA predictions and the experimental data. We further extended the utility of SCUBA by developing a method to reconstruct missing temporal-order information from a typical single-cell dataset. Analysis of a hematopoietic dataset suggests that our method is effective for reconstructing gene expression dynamics during human B-cell development. In summary, SCUBA provides a useful single-cell data analysis tool that is well-suited for the investigation of developmental processes.
bifurcation; differentiation; gene expression; single cell.
Conflict of interest statement
The authors declare no conflict of interest.
Overview of the SCUBA method. (
Top) Structure of single-cell data. Individual cell samples are ordered by their corresponding developmental time. ( Middle and Bottom) Schematic of the two main steps of SCUBA. In the bottom panel, the parameter space is divided into two regions, corresponding to one (green region and I) or two attractor states (blue region and II ), respectively. The surface on top of parameter space shows the steady-state solutions corresponding to each parameter setting. Stable and unstable steady states are colored differently.
Lineage tree reconstructed based on single-cell RT-PCR data in mouse embryos. (
A) Overall structure of the dynamic clustering and projection of the clustering pattern onto the plane spanned by the two bifurcation directions. Note that these two directions, X32 and X64, are not exactly orthogonal. Each color represents a different cluster. Parent–progeny cluster pairs are connected by straight lines. ( B) Relative weight of all genes on the two bifurcation directions. Genes with the biggest and smallest weights along X32 and X64 are labeled. Transcription factor labels are in red. ( C) Change of gene expression variance associated with dynamic clustering. Node size represents total variance for each cluster, color-coded as in A. Inset color bars compare the total variance before and after each bifurcation event, as indicated by the curly brackets. The total variance is further decomposed into two portions, corresponding to the bifurcation direction (red) and all other directions (blue).
Reconstructed gene expression dynamics associated with the 32-cell (
A and B) and 64-cell ( C and D) bifurcations. ( A and C) Histograms of cell populations along the bifurcation axis and fitted equilibrium distributions (dark curves). ( B and D) The potential function V( x) inferred from the equilibrium distribution. The smooth surface is obtained by interpolation.
Prediction of the effect of biological noise on the maintenance of lineage diversity. (
A and B) Equilibrium distributions for the ( A) 32- and ( B) 64-cell population when noise levels were changed by a factor K. Black line, cell counts as in our fits to the data in the last stages in Fig. 3 A and C. Increasing the noise by a factor of 2 ( K = 2, red line) broadens the distributions. Reducing the noise levels by a factor of 2 ( K = 1/2, green line) leads to an increase of the TE population at the 32-cell stage and a very significant increase of the PE population at the 64-cell stage, with a near disappearance of the EPI population.
Prediction and validation of the effect of perturbing transcription factors on lineage bias. (
A) Predicted splitting probabilities for the 64-cell bifurcation (left axis) overlaid with the potential function V( x) (right axis). EPI and PE correspond to the local minima of the potential and C (black dot) is the local maximum of the potential, located at approximately −0.5. The predicted effect of a twofold depletion of the transcription factors Gata4 (green dot) or Sox2 (red dot) is highlighted. ( B and C) Predicted lineage bias at the ( B) 32- and ( C) 64-cell stage after a twofold depletion of each profiled transcription factor. ( D) Heat map shows Ct values for 48 genes (columns) in 25 different whole embryos (rows sorted by Nanog Ct values). Coexpressed genes are grouped together by hierarchical clustering. ( E) Lineage bias introduced by decreasing Nanog expression values. Experimentally determined values are shown as black dots and model predictions as colored lines.
SCUBA analysis of the single-cell RNA-seq data in mouse embryo. (
A) Lineage tree reconstructed by SCUBA. Node sizes are proportional to number of cells. ( B) Comparison of the 32-cell bifurcation directions derived from the RNA-seq vs. RT-PCR datasets. Scatter plot shows the gene weights associated with each of the 13 common genes between the two datasets. ( C) Distribution of gene weights for the 1,000 most variable genes. Some well-characterized regulators are indicated. ( D) Equilibrium distribution and ( E) potential function V( x) corresponding to the 32-cell bifurcation.
SCUBA analysis of human B-cell development data. (
A) Inference of SCUBA pseudotime based on t-SNE and principal curve analysis. The dataset was reduced to a 3D space by using t-SNE (colored dots). Black line is the principal curve fitted to the data. Cells are color-coded by pseudotime. ( B) Selected normalized gene expression profiles for cells sorted using SCUBA pseudotime. ( C) Density plot for the distribution of SCUBA pseudotimes ( x axis) against Wanderlust pseudotimes ( y axis). ( D) Lineage tree inferred by applying SCUBA to the pseudotime estimated by our principal curve analysis ( Top) or Wanderlust ( Bottom).
All figures (7)
Stem cell differentiation as a many-body problem.
Proc Natl Acad Sci U S A. 2014 Jul 15;111(28):10185-90. doi: 10.1073/pnas.1408561111. Epub 2014 Jun 19.
Proc Natl Acad Sci U S A. 2014.
24946805 Free PMC article.
Formation of an active tissue-specific chromatin domain initiated by epigenetic marking at the embryonic stem cell stage.
Mol Cell Biol. 2005 Mar;25(5):1804-20. doi: 10.1128/MCB.25.5.1804-1820.2005.
Mol Cell Biol. 2005.
15713636 Free PMC article.
Rest promotes the early differentiation of mouse ESCs but is not required for their maintenance.
Cell Stem Cell. 2010 Jan 8;6(1):10-5. doi: 10.1016/j.stem.2009.12.003.
Cell Stem Cell. 2010.
20085738 No abstract available.
Heterogeneity of embryonic and adult stem cells.
Cell Stem Cell. 2008 Nov 6;3(5):480-3. doi: 10.1016/j.stem.2008.10.007.
Cell Stem Cell. 2008.
CALISTA: Clustering and LINEAGE Inference in Single-Cell Transcriptional Analysis.
Front Bioeng Biotechnol. 2020 Feb 4;8:18. doi: 10.3389/fbioe.2020.00018. eCollection 2020.
Front Bioeng Biotechnol. 2020.
32117910 Free PMC article.
Inferring TF activation order in time series scRNA-Seq studies.
PLoS Comput Biol. 2020 Feb 18;16(2):e1007644. doi: 10.1371/journal.pcbi.1007644. eCollection 2020 Feb.
PLoS Comput Biol. 2020.
32069291 Free PMC article.
Circular Trajectory Reconstruction Uncovers Cell-Cycle Progression and Regulatory Dynamics from Single-Cell Hi-C Maps.
Adv Sci (Weinh). 2019 Sep 30;6(23):1900986. doi: 10.1002/advs.201900986. eCollection 2019 Dec.
Adv Sci (Weinh). 2019.
31832309 Free PMC article.
Application of Computational Biology to Decode Brain Transcriptomes.
Genomics Proteomics Bioinformatics. 2019 Aug;17(4):367-380. doi: 10.1016/j.gpb.2019.03.003. Epub 2019 Oct 23.
Genomics Proteomics Bioinformatics. 2019.
31655213 Free PMC article.
Liquid biopsy: one cell at a time.
NPJ Precis Oncol. 2019 Oct 2;3:23. doi: 10.1038/s41698-019-0095-0. eCollection 2019.
NPJ Precis Oncol. 2019.
31602399 Free PMC article.
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
B-Lymphocytes / metabolism
Cell Differentiation / physiology
Embryo, Mammalian / cytology
Embryo, Mammalian / metabolism
Epigenesis, Genetic / physiology
Gene Expression Regulation, Developmental / physiology
Hematopoiesis / physiology
Homeodomain Proteins / genetics
Homeodomain Proteins / metabolism