SCAFE: a software suite for analysis of transcribed cis-regulatory elements in single cells

Bioinformatics. 2022 Nov 15;38(22):5126-5128. doi: 10.1093/bioinformatics/btac644.

Abstract

Motivation: Cell type-specific activities of cis-regulatory elements (CRE) are central to understanding gene regulation and disease predisposition. Single-cell RNA 5'end sequencing (sc-end5-seq) captures the transcription start sites (TSS) which can be used as a proxy to measure the activity of transcribed CREs (tCREs). However, a substantial fraction of TSS identified from sc-end5-seq data may not be genuine due to various artifacts, hindering the use of sc-end5-seq for de novo discovery of tCREs.

Results: We developed SCAFE-Single-Cell Analysis of Five-prime Ends-a software suite that processes sc-end5-seq data to de novo identify TSS clusters based on multiple logistic regression. It annotates tCREs based on the identified TSS clusters and generates a tCRE-by-cell count matrix for downstream analyses. The software suite consists of a set of flexible tools that could either be run independently or as pre-configured workflows.

Availability and implementation: SCAFE is implemented in Perl and R. The source code and documentation are freely available for download under the MIT License from https://github.com/chung-lab/SCAFE. Docker images are available from https://hub.docker.com/r/cchon/scafe. The submitted software version and test data are archived at https://doi.org/10.5281/zenodo.7023163 and https://doi.org/10.5281/zenodo.7024060, respectively.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Regulatory Sequences, Nucleic Acid*
  • Software*
  • Transcription Initiation Site
  • Workflow