Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 8;14(1):679.
doi: 10.1038/s41467-023-36383-6.

Cartography of Genomic Interactions Enables Deep Analysis of Single-Cell Expression Data

Affiliations

Cartography of Genomic Interactions Enables Deep Analysis of Single-Cell Expression Data

Md Tauhidul Islam et al. Nat Commun. .

Abstract

Remarkable advances in single cell genomics have presented unique challenges and opportunities for interrogating a wealth of biomedical inquiries. High dimensional genomic data are inherently complex because of intertwined relationships among the genes. Existing methods, including emerging deep learning-based approaches, do not consider the underlying biological characteristics during data processing, which greatly compromises the performance of data analysis and hinders the maximal utilization of state-of-the-art genomic techniques. In this work, we develop an entropy-based cartography strategy to contrive the high dimensional gene expression data into a configured image format, referred to as genomap, with explicit integration of the genomic interactions. This unique cartography casts the gene-gene interactions into the spatial configuration of genomaps and enables us to extract the deep genomic interaction features and discover underlying discriminative patterns of the data. We show that, for a wide variety of applications (cell clustering and recognition, gene signature extraction, single cell data integration, cellular trajectory analysis, dimensionality reduction, and visualization), the proposed approach drastically improves the accuracies of data analyses as compared to the state-of-the-art techniques.

PubMed Disclaimer

Conflict of interest statement

A patent application based on this work has been submitted (application number: 63/479,724) by the Board of Trustees of the Leland Stanford Junior University. The names of the inventors are Lei Xing and Md Tauhidul Islam. The patent application covers all the contents of the manuscript.

Figures

Fig. 1
Fig. 1. Deep analysis of scRNA-seq data by using genomap and genoNet.
a Workflow of genomap generation from scRNA-seq data. Here, i ≠ j ≠ k, i = 1, …, n, j = 1, …, n, , k = 1, …, n. Note that the genomap is dataset dependent and the gene distribution in the genomaps vary with dataset. b GenoNet is applied on the genomaps to extract deep level features for decision-making.
Fig. 2
Fig. 2. Genomaps of 100 cells belonging to 10 different classes from Tabula Muris dataset.
Each row in the figure corresponds to a class. For each class, the 10 cells show very similar patterns of genomap. Here, the smallest value in genomaps is denoted by blue and the largest value is denoted by yellow.
Fig. 3
Fig. 3. Genomaps of 100 cells belonging to 10 different classes from ischaemic sensitivity dataset acquired from the lung.
Each row in the figure corresponds to a class. For each class, the 10 cells show very similar patterns of genomap. As examples, in mature B cells (2nd row), the genes in a circular ring close to the boundary have high expressions whereas the genes in other areas have low expressions. On the other hand, in naive B-cells (3rd row), the genes in a circular region close to the center have very high expressions. Here, the lowest gene expression value is denoted by blue and the highest value is denoted by yellow.
Fig. 4
Fig. 4. Visualization of ischaemic sensitivity dataset (left-lung, middle-esophagus, right-spleen).
a UMAP visualizations of raw data. b UMAP visualizations of the genomap features at the fully connected layer of the genoNet. Major improvements in cluster separation are indicated by arrows. Color legends of the data classes are added in Supplementary Fig. S22. c Classification accuracy of different techniques including genomap+genoNet. Here, Cell-ID(c) and Cell-ID(g) denote Cell-ID technique with cell-to-cell and cell-to-group matching formulation. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Analysis of T cell landscape dataset.
(a left) UMAP visualizations of the raw data. (a middle and right) UMAP visualizations of features from the fully connected layer of the genoNet for the training and testing datasets. Major improvements in cluster separation are indicated by arrows. Color legends of the data classes are added in Supplementary Fig. S23. b Classification accuracies of the genomaps with the proposed approach and existing techniques. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Class activation maps of the cells displayed in Fig. 2.
The intensity of a pixel in the map denotes the importance (scaled from 0 to 1) of the corresponding gene. For each class, very similar patterns of class activation maps are observed for the 10 cells, confirming the existence of a number of genes specific to a data class (class-specific gene set). Here, the lowest and highest values in the maps are denoted by blue and yellow colors, respectively.
Fig. 7
Fig. 7. Integration of the single cell datasets obtained by using five different measurement protocols.
In (a) UMAP visualizations of embeddings resulted from different integration techniques (Seurat, Haromony, Online iNMF, and genomap+genoNet) are shown. Data of different measurement protocols are denoted by different colors. In (b), the same UMAP visualizations of the embeddings are shown with the cell classes denoted by different colors. (c) Label transfer accuracy of different integration techniques (left to right-Segerstolpe, Baron, and Muraro datasets). Source data are provided as a Source Data file.
Fig. 8
Fig. 8. Cellular trajectory analysis of proto-vertebrate dataset by using different techniques.
(a-1st row) t-SNE and UMAP visualizations of the data. (a-2nd row, first and second columns) Unsupervised and supervised PHATE visualizations of the data. (a-2nd row, last column) PHATE visualization of the embedding from the unsupervised genoNet. In contrast to the results of existing techniques, the transitions of cells from initial grastula to larva are quite evident in the visualization of the proposed approach. (b) DEMaP computed from embeddings of different techniques. Source data are provided as a Source Data file.
Fig. 9
Fig. 9. Dimensionality reduction, clustering, and visualization of the retinal bipolar cells by different techniques.
a t-SNE, LDA, Simaese network, and unsupervised and supervised UMAP visualizations of the raw data. b t-SNE and UMAP visualizations of the embeddings from unsupervised genoNet. Major improvements in cluster separation are indicated by arrows. c Quantitative comparison of cluster quality of embeddings from different techniques in terms of accuracy, AR, Silhouette and NMI indices (from left to right panels). The bar heights denote the mean values of the indices for for 1000 different initializations of Louvain clustering. GSNE and GMAP denote the t-SNE and UMAP embedding obtained from genoNet features. Source data are provided as a Source Data file.

Similar articles

Cited by

References

    1. Bian S, et al. Single-cell multiomics sequencing and analyses of human colorectal cancer. Science. 2018;362:1060–1063. - PubMed
    1. Wilk AJ, et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 2020;26:1070–1076. - PMC - PubMed
    1. Vento-Tormo R, et al. Single-cell reconstruction of the early maternal–fetal interface in humans. Nature. 2018;563:347–353. - PMC - PubMed
    1. Peng J, et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725–738. - PMC - PubMed
    1. Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z. Tumour heterogeneity and metastasis at single-cell resolution. Nat. Cell Biol. 2018;20:1349–1360. - PMC - PubMed