Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 111 (52), E5643-50

Bifurcation Analysis of Single-Cell Gene Expression Data Reveals Epigenetic Landscape

Affiliations

Bifurcation Analysis of Single-Cell Gene Expression Data Reveals Epigenetic Landscape

Eugenio Marco et al. Proc Natl Acad Sci U S A.

Abstract

We present single-cell clustering using bifurcation analysis (SCUBA), a novel computational method for extracting lineage relationships from single-cell gene expression data and modeling the dynamic changes associated with cell differentiation. SCUBA draws techniques from nonlinear dynamics and stochastic differential equation theories, providing a systematic framework for modeling complex processes involving multilineage specifications. By applying SCUBA to analyze two complementary, publicly available datasets we successfully reconstructed the cellular hierarchy during early development of mouse embryos, modeled the dynamic changes in gene expression patterns, and predicted the effects of perturbing key transcriptional regulators on inducing lineage biases. The results were robust with respect to experimental platform differences between RT-PCR and RNA sequencing. We selectively tested our predictions in Nanog mutants and found good agreement between SCUBA predictions and the experimental data. We further extended the utility of SCUBA by developing a method to reconstruct missing temporal-order information from a typical single-cell dataset. Analysis of a hematopoietic dataset suggests that our method is effective for reconstructing gene expression dynamics during human B-cell development. In summary, SCUBA provides a useful single-cell data analysis tool that is well-suited for the investigation of developmental processes.

Keywords: bifurcation; differentiation; gene expression; single cell.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Overview of the SCUBA method. (Top) Structure of single-cell data. Individual cell samples are ordered by their corresponding developmental time. (Middle and Bottom) Schematic of the two main steps of SCUBA. In the bottom panel, the parameter space is divided into two regions, corresponding to one (green region and I) or two attractor states (blue region and II), respectively. The surface on top of parameter space shows the steady-state solutions corresponding to each parameter setting. Stable and unstable steady states are colored differently.
Fig. 2.
Fig. 2.
Lineage tree reconstructed based on single-cell RT-PCR data in mouse embryos. (A) Overall structure of the dynamic clustering and projection of the clustering pattern onto the plane spanned by the two bifurcation directions. Note that these two directions, X32 and X64, are not exactly orthogonal. Each color represents a different cluster. Parent–progeny cluster pairs are connected by straight lines. (B) Relative weight of all genes on the two bifurcation directions. Genes with the biggest and smallest weights along X32 and X64 are labeled. Transcription factor labels are in red. (C) Change of gene expression variance associated with dynamic clustering. Node size represents total variance for each cluster, color-coded as in A. Inset color bars compare the total variance before and after each bifurcation event, as indicated by the curly brackets. The total variance is further decomposed into two portions, corresponding to the bifurcation direction (red) and all other directions (blue).
Fig. 3.
Fig. 3.
Reconstructed gene expression dynamics associated with the 32-cell (A and B) and 64-cell (C and D) bifurcations. (A and C) Histograms of cell populations along the bifurcation axis and fitted equilibrium distributions (dark curves). (B and D) The potential function V(x) inferred from the equilibrium distribution. The smooth surface is obtained by interpolation.
Fig. 4.
Fig. 4.
Prediction of the effect of biological noise on the maintenance of lineage diversity. (A and B) Equilibrium distributions for the (A) 32- and (B) 64-cell population when noise levels were changed by a factor K. Black line, cell counts as in our fits to the data in the last stages in Fig. 3 A and C. Increasing the noise by a factor of 2 (K = 2, red line) broadens the distributions. Reducing the noise levels by a factor of 2 (K = 1/2, green line) leads to an increase of the TE population at the 32-cell stage and a very significant increase of the PE population at the 64-cell stage, with a near disappearance of the EPI population.
Fig. 5.
Fig. 5.
Prediction and validation of the effect of perturbing transcription factors on lineage bias. (A) Predicted splitting probabilities for the 64-cell bifurcation (left axis) overlaid with the potential function V(x) (right axis). EPI and PE correspond to the local minima of the potential and C (black dot) is the local maximum of the potential, located at approximately −0.5. The predicted effect of a twofold depletion of the transcription factors Gata4 (green dot) or Sox2 (red dot) is highlighted. (B and C) Predicted lineage bias at the (B) 32- and (C) 64-cell stage after a twofold depletion of each profiled transcription factor. (D) Heat map shows Ct values for 48 genes (columns) in 25 different whole embryos (rows sorted by Nanog Ct values). Coexpressed genes are grouped together by hierarchical clustering. (E) Lineage bias introduced by decreasing Nanog expression values. Experimentally determined values are shown as black dots and model predictions as colored lines.
Fig. 6.
Fig. 6.
SCUBA analysis of the single-cell RNA-seq data in mouse embryo. (A) Lineage tree reconstructed by SCUBA. Node sizes are proportional to number of cells. (B) Comparison of the 32-cell bifurcation directions derived from the RNA-seq vs. RT-PCR datasets. Scatter plot shows the gene weights associated with each of the 13 common genes between the two datasets. (C) Distribution of gene weights for the 1,000 most variable genes. Some well-characterized regulators are indicated. (D) Equilibrium distribution and (E) potential function V(x) corresponding to the 32-cell bifurcation.
Fig. 7.
Fig. 7.
SCUBA analysis of human B-cell development data. (A) Inference of SCUBA pseudotime based on t-SNE and principal curve analysis. The dataset was reduced to a 3D space by using t-SNE (colored dots). Black line is the principal curve fitted to the data. Cells are color-coded by pseudotime. (B) Selected normalized gene expression profiles for cells sorted using SCUBA pseudotime. (C) Density plot for the distribution of SCUBA pseudotimes (x axis) against Wanderlust pseudotimes (y axis). (D) Lineage tree inferred by applying SCUBA to the pseudotime estimated by our principal curve analysis (Top) or Wanderlust (Bottom).

Similar articles

See all similar articles

Cited by 98 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback