Visualization, benchmarking and characterization of nested single-cell heterogeneity as dynamic forest mixtures

Brief Bioinform. 2022 Mar 10;23(2):bbac017. doi: 10.1093/bib/bbac017.


A major topic of debate in developmental biology centers on whether development is continuous, discontinuous, or a mixture of both. Pseudo-time trajectory models, optimal for visualizing cellular progression, model cell transitions as continuous state manifolds and do not explicitly model real-time, complex, heterogeneous systems and are challenging for benchmarking with temporal models. We present a data-driven framework that addresses these limitations with temporal single-cell data collected at discrete time points as inputs and a mixture of dependent minimum spanning trees (MSTs) as outputs, denoted as dynamic spanning forest mixtures (DSFMix). DSFMix uses decision-tree models to select genes that account for variations in multimodality, skewness and time. The genes are subsequently used to build the forest using tree agglomerative hierarchical clustering and dynamic branch cutting. We first motivate the use of forest-based algorithms compared to single-tree approaches for visualizing and characterizing developmental processes. We next benchmark DSFMix to pseudo-time and temporal approaches in terms of feature selection, time correlation, and network similarity. Finally, we demonstrate how DSFMix can be used to visualize, compare and characterize complex relationships during biological processes such as epithelial-mesenchymal transition, spermatogenesis, stem cell pluripotency, early transcriptional response from hormones and immune response to coronavirus disease. Our results indicate that the expression of genes during normal development exhibits a high proportion of non-uniformly distributed profiles that are mostly right-skewed and multimodal; the latter being a characteristic of major steady states during development. Our study also identifies and validates gene signatures driving complex dynamic processes during somatic or germline differentiation.

Keywords: cell differentiation; forest mixtures; minimum spanning tree; multimodality; nested models; single-cell trajectory analysis.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Algorithms
  • Animals
  • Benchmarking*
  • Cellular Microenvironment
  • Data Analysis
  • Decision Trees
  • Gene Expression Profiling / methods
  • Humans
  • Models, Theoretical*
  • Single-Cell Analysis / methods*
  • Spermatogenesis