Deciphering cell types by integrating scATAC-seq data with genome sequences

Nat Comput Sci. 2024 Apr;4(4):285-298. doi: 10.1038/s43588-024-00622-7. Epub 2024 Apr 10.

Abstract

The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Chromatin Immunoprecipitation Sequencing / methods
  • Chromatin* / genetics
  • Chromatin* / metabolism
  • Computational Biology / methods
  • Genome / genetics
  • Genomics / methods
  • Humans
  • Neoplasms / genetics
  • Single-Cell Analysis* / methods
  • Transposases / genetics
  • Transposases / metabolism

Substances

  • Chromatin
  • Transposases