scAPAtrap: identification and quantification of alternative polyadenylation sites from single-cell RNA-seq data

Brief Bioinform. 2021 Jul 20;22(4):bbaa273. doi: 10.1093/bib/bbaa273.

Abstract

Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.

Keywords: 3′ untranslated region; RNA processing; alternative polyadenylation; single-cell RNA-seq; software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Nucleic Acid*
  • Genome*
  • Molecular Sequence Annotation
  • RNA 3' Polyadenylation Signals*
  • RNA-Seq*
  • Single-Cell Analysis*
  • Software*