Variant calling enhances the identification of cancer cells in single-cell RNA sequencing data

PLoS Comput Biol. 2022 Oct 3;18(10):e1010576. doi: 10.1371/journal.pcbi.1010576. eCollection 2022 Oct.


Single-cell RNA-sequencing is an invaluable research tool that allows for the investigation of gene expression in heterogeneous cancer cell populations in ways that bulk RNA-seq cannot. However, normal (i.e., non tumor) cells in cancer samples have the potential to confound the downstream analysis of single-cell RNA-seq data. Existing methods for identifying cancer and normal cells include copy number variation inference, marker-gene expression analysis, and expression-based clustering. This work aims to extend the existing approaches for identifying cancer cells in single-cell RNA-seq samples by incorporating variant calling and the identification of putative driver alterations. We found that putative driver alterations can be detected in single-cell RNA-seq data obtained with full-length transcript technologies and noticed that a subset of cells in tumor samples are enriched for putative driver alterations as compared to normal cells. Furthermore, we show that the number of putative driver alterations and inferred copy number variation are not correlated in all samples. Taken together, our findings suggest that augmenting existing cancer-cell filtering methods with variant calling and analysis can increase the number of tumor cells that can be confidently included in downstream analyses of single-cell full-length transcript RNA-seq datasets.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Cluster Analysis
  • DNA Copy Number Variations* / genetics
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Neoplasms* / genetics
  • RNA / genetics
  • RNA-Seq
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods


  • RNA

Grant support

This work was partly supported by a Cancer Prevention & Research of Texas Recruitment of First-Time Tenure-Track Faculty Members grant (RR200023) to ML and an R37 NIH Cancer Institute grant (5R37CA242070) to ML. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.