SCReadCounts: estimation of cell-level SNVs expression from scRNA-seq data

BMC Genomics. 2021 Sep 22;22(1):689. doi: 10.1186/s12864-021-07974-8.

Abstract

Background: Recent studies have demonstrated the utility of scRNA-seq SNVs to distinguish tumor from normal cells, characterize intra-tumoral heterogeneity, and define mutation-associated expression signatures. In addition to cancer studies, SNVs from single cells have been useful in studies of transcriptional burst kinetics, allelic expression, chromosome X inactivation, ploidy estimations, and haplotype inference.

Results: To aid these types of studies, we have developed a tool, SCReadCounts, for cell-level tabulation of the sequencing read counts bearing SNV reference and variant alleles from barcoded scRNA-seq alignments. Provided genomic loci and expected alleles, SCReadCounts generates cell-SNV matrices with the absolute variant- and reference-harboring read counts, as well as cell-SNV matrices of expressed Variant Allele Fraction (VAFRNA) suitable for a variety of downstream applications. We demonstrate three different SCReadCounts applications on 59,884 cells from seven neuroblastoma samples: (1) estimation of cell-level expression of known somatic mutations and RNA-editing sites, (2) estimation of cell- level allele expression of biallelic SNVs, and (3) a discovery mode assessment of the reference and each of the three alternative nucleotides at genomic positions of interest that does not require prior SNV information. For the later, we applied SCReadCounts on the coding regions of KRAS, where it identified known and novel somatic mutations in a low-to-moderate proportion of cells. The SCReadCounts read counts module is benchmarked against the analogous modules of GATK and Samtools. SCReadCounts is freely available ( https://github.com/HorvathLab/NGS ) as 64-bit self-contained binary distributions for Linux and MacOS, in addition to Python source.

Conclusions: SCReadCounts supplies a fast and efficient solution for estimation of cell-level SNV expression from scRNA-seq data. SCReadCounts enables distinguishing cells with monoallelic reference expression from those with no gene expression and is applicable to assess SNVs present in only a small proportion of the cells, such as somatic mutations in cancer.

Keywords: Allele; Allele expression; Mutation; SNP; SNV; Single cell; Single cell RNA sequencing; Somatic mutation; scRNA-seq.

MeSH terms

  • Polymorphism, Single Nucleotide
  • RNA
  • RNA, Small Cytoplasmic*
  • Sequence Analysis, RNA
  • Single-Cell Analysis
  • Software

Substances

  • RNA, Small Cytoplasmic
  • RNA