Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2023 Feb 2;14(1):568.
doi: 10.1038/s41467-023-36062-6.

Estimation of cell lineages in tumors from spatial transcriptomics data

Affiliations
Randomized Controlled Trial

Estimation of cell lineages in tumors from spatial transcriptomics data

Beibei Ru et al. Nat Commun. .

Abstract

Spatial transcriptomics (ST) technology through in situ capturing has enabled topographical gene expression profiling of tumor tissues. However, each capturing spot may contain diverse immune and malignant cells, with different cell densities across tissue regions. Cell type deconvolution in tumor ST data remains challenging for existing methods designed to decompose general ST or bulk tumor data. We develop the Spatial Cellular Estimator for Tumors (SpaCET) to infer cell identities from tumor ST data. SpaCET first estimates cancer cell abundance by integrating a gene pattern dictionary of copy number alterations and expression changes in common malignancies. A constrained regression model then calibrates local cell densities and determines immune and stromal cell lineage fractions. SpaCET provides higher accuracy than existing methods based on simulation and real ST data with matched double-blind histopathology annotations as ground truth. Further, coupling cell fractions with ligand-receptor coexpression analysis, SpaCET reveals how intercellular interactions at the tumor-immune interface promote cancer progression.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Inferring cell fractions and interactions in tumor spatial transcriptomics.
a Three stages from input spatial transcriptomics (ST) data to cell lineage fractions and intercellular interactions. b Malignant cell fraction inference through a gene pattern dictionary. For a tumor ST dataset, SpaCET uses a dictionary of copy number alterations or tumor transcriptome patterns to identify tumor spots and further computes an ST-specific malignant expression profile. Then, SpaCET correlates the ST-specific malignant profile with the expression profile of each spot and normalizes the correlation coefficients to 0–1 as the malignant fractions of all spots. c Hierarchical deconvolution of nonmalignant cell fractions. Based on a hierarchical cell reference from the public scRNA-seq data atlas, SpaCET utilizes a constrained linear regression to estimate cell fractions on two levels. For level one, SpaCET decomposes the nonmalignant cell fractions into major lineages and unidentifiable components. For level two, major lineage fractions are further decomposed into corresponding sublineage fractions. d Cell–cell interaction analysis by testing cell colocalizations and ligand–receptor interactions. Based on inferred cell fractions, SpaCET measures cell colocalization through correlations across spots. Then, for the cell-type colocalized spots, SpaCET tests the significance of ligand–receptor co-expression as further evidence of physical interaction.
Fig. 2
Fig. 2. Performance evaluation based on simulated ST data.
a Cell lineage proportions in 10 tumor scRNA-seq datasets used for ST simulation. b Hierarchical clustering of cell lineage reference profiles from scRNA-seq datasets based on marker gene set similarities. c Performance in intra-dataset validation for each scRNA-seq dataset (row) and cell type (column). The color in the heatmap presents Pearson correlation (r) between predicted versus known cell fractions. The gray color in the heatmap indicates missing cell types in the scRNA-seq dataset. Boxplots on the top present r values of the same cell type across all datasets (n = 10). Boxplots on the right present r values of all cell type predictions in the same dataset. We shuffled spot identities of cell type fraction vectors within each synthetic ST data and computed r values as random controls. For boxplots, the thick line represents the median value. The bottom and top of the boxes are the 25th and 75th percentiles (interquartile range). The whiskers encompass 1.5 times the interquartile range. d Performance of inter-dataset validation between scRNA-seq cohorts. The column and row labels show the scRNA-seq datasets (n = 10) used to generate cell-type reference profiles and synthetic ST data, respectively. The color in the heatmap presents the median Pearson correlation (r) between predicted and known cell fractions across all cell types. Boxplots and random controls are plotted as panel c. e Performance comparison between SpaCET and previous methods (color ordered in panel f). A dot represents a simulated ST dataset synthesized from a single scRNA-seq dataset (n = 10). The y value of an ST dataset presents the median Pearson correlation r between predicted and known cell fractions across cell types. All tools used the leave-one-out signature in panel d. The difference between SpaCET and other tools was evaluated by the two-sided Wilcoxon signed-rank test. A star indicates that SpaCET is significantly better than others (BH-adjusted p value <0.05). Bar height denotes the average value across simulated ST datasets; error bars denote standard errors. f Comparison of running time and memory consumption, using a simulated ST dataset of 1200 spots with default parameters.
Fig. 3
Fig. 3. Performance validation based on double-blind pathology annotations.
a Multiple tumor ST datasets used for performance evaluation. The human body outline was generated using BioRender. b An example hematoxylin and eosin (H&E) image with double-blind pathology annotations. c Unidentifiable component fractions (left) and unique molecular identifier (UMI) counts (right) across spots in both high and low cellular density regions. The group values were compared by calculating the Cohen’s d effect size and the two-sided Wilcoxon rank-sum test. For the boxplot, the thick line represents the median value. The bottom and top of the boxes are the 25th and 75th percentiles (interquartile range). The whiskers encompass 1.5 times the interquartile range. d Fractions of malignant, stromal, macrophage, and lymphocyte cells, decomposed by SpaCET. e Receiver operating characteristic (ROC) curves of cell fraction prediction. This example is based on the cell region annotation in panel b. For each method, the ROC curve presents false-positive rates against true-positive rates at different thresholds of cell fraction across spots. f Performance comparison among methods. Each dot represents a dataset (n = 8 for each bar). y-axis presents the area under the ROC curve (AUC) value of cell fraction decompositions for each method. The subpanels represent the results in distinct tumor regions, and the last subpanel considered data from all three region types together. In each subpanel, the difference between SpaCET and other tools was evaluated by the two-sided Wilcoxon signed-rank test. A star indicates that SpaCET is significantly better than others (BH-adjusted p value <0.05). Bar height denotes the average value across ST datasets; error bars denote standard errors.
Fig. 4
Fig. 4. Application of SpaCET to a colon cancer Slide-seq dataset.
a The H&E-stained image with double-blind pathology annotations. b Fractions of malignant and stromal cells, decomposed by SpaCET. c ROC curves of cell fraction prediction based on the annotation in panel a, shown as Fig. 3e. d Spatial localization of cell major lineages. The cell type of a bead is defined by the most abundant cell type in this bead. e Number of beads for each cell type.
Fig. 5
Fig. 5. SpaCET identifies intercellular interactions in the breast tumor.
a Spearman correlations of cell-type fractions across breast tumor ST spots. Each node in the network represents a cell type, and the size of a node refers to the average fraction of this cell type across all spots. Each edge represents the colocalization of a cell-type pair, and the size of an edge refers to the fraction product of this cell-type pair. b Spearman correlation analysis between reference profiles (x-axis) and between cell-type fractions (y-axis). Each dot represents a cell-type pair. The straight line presents the weighted linear regression result with the gray shadow as the 95% confidence interval. c Ligand–receptor interaction network scores for all spots. d CAF and M2 fractions across all spots. Each dot represents an ST spot. According to the CAF or M2 cell fractions, spots were grouped into four categories: CAF–M2 colocalized (top 15% in both CAF and M2, n = 182), CAF-dominated (Top 15% in CAF and bottom 75% in M2, n = 295), M2 dominated (Top 15% in M2 and bottom 75% in CAF, n = 234), and others (n = 3102). The straight line presents the linear regression between the cell fractions of CAF and M2 with the gray shadow as the 95% confidence interval. The Rho and p values are computed from the two-sided Spearman correlation test (n = 3813 spots). e Spatial distribution of CAF–M2 colocalized and CAF/M2-dominated spots in panel d. f Difference of L–R interaction network score between CAF–M2 colocalized spots and CAF/M2-dominated spots in panel d. For the boxplot, the thick line represents the median value. The bottom and top of the boxes are the 25th and 75th percentiles (interquartile range). The whiskers encompass 1.5 times the interquartile range. Group values were compared by Cohen’s d effect size and two-sided Wilcoxon rank-sum test. g L–R pairs mediating the CAF–M2 interaction in the current breast cancer tissue. The direction of an arrow source from ligand to receptor.
Fig. 6
Fig. 6. Association between CAF–M2 interactions and malignant cell invasion.
a The CAF–M2 spots at the interface between the tumor and immune regions in a breast tumor example. The distance between a CAF–M2 spot and a tumor-immune interface is the shortest path of this spot to the interface. b Distance between CAF–M2 interaction spots and tumor-immune boundaries. The green line represents the average distance between each CAF–M2 spot and the tumor-immune interface. The null distribution of distance was computed through 1000 randomizations. c The spots of close and distant malignant cells relative to CAF–M2 interaction spots. d Gene set enrichment analysis (GSEA) of the differential expression between close and distant malignant cells to CAF–M2 spots conditioned on malignant cell fractions. e The GSEA enrichment plot of epithelial-mesenchymal transition pathway from panel d. The x-axis represents the gene list ranked by the differential expression analyzed in panel d. The black vertical bars along the x-axis represent genes from the pathway labeled. If all genes (vertical bars) of a pathway tend to be enriched in the left-most part of the x-axis, it indicates that this pathway is active in close malignant cells, and vice versa. The cyan line is the enrichment curve of the pathway, and the red dashed line refers to the maximum and minimum value of the cyan line, respectively. The p value is computed through the two-sided permutation test (n = 1000 randomizations) adjusted by the Benjamini–Hochberg procedure.

Similar articles

Cited by

References

    1. Maniatis S, Petrescu J, Phatnani H. Spatially resolved transcriptomics and its applications in cancer. Curr. Opin. Genet. Dev. 2021;66:70–77. - PMC - PubMed
    1. Crosetto N, Bienko M, van Oudenaarden A. Spatially resolved transcriptomics and beyond. Nat. Rev. Genet. 2015;16:57–66. - PubMed
    1. Asp M, Bergenstråhle J, Lundeberg J. Spatially resolved transcriptomes—next generation tools for tissue exploration. Bioessays. 2020;42:e1900221. - PubMed
    1. Rodriques SG, et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463–1467. - PMC - PubMed
    1. 10x Genomics. https://support.10xgenomics.com/spatial-gene-expression/.

Publication types