Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data

Nat Commun. 2021 Sep 6;12(1):5261. doi: 10.1038/s41467-021-25534-2.

Abstract

The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 106 cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease / genetics
  • Alzheimer Disease / pathology
  • Animals
  • Databases, Genetic*
  • Depressive Disorder, Major / genetics
  • Depressive Disorder, Major / pathology
  • Gene Expression Profiling / methods
  • Gene Expression Profiling / statistics & numerical data
  • Genes, Mitochondrial
  • Humans
  • Mice
  • Models, Genetic*
  • Neural Networks, Computer
  • RNA, Small Cytoplasmic
  • Retina / cytology
  • Retina / physiology
  • Sequence Analysis, RNA / methods
  • Sequence Analysis, RNA / statistics & numerical data*
  • Single-Cell Analysis / methods*

Substances

  • RNA, Small Cytoplasmic