DropletQC: improved identification of empty droplets and damaged cells in single-cell RNA-seq data

Genome Biol. 2021 Dec 2;22(1):329. doi: 10.1186/s13059-021-02547-0.

Abstract

Background: Advances in droplet-based single-cell RNA-sequencing (scRNA-seq) have dramatically increased throughput, allowing tens of thousands of cells to be routinely sequenced in a single experiment. In addition to cells, droplets capture cell-free "ambient" RNA predominantly caused by lysis of cells during sample preparation. Samples with high ambient RNA concentration can create challenges in accurately distinguishing cell-containing droplets and droplets containing ambient RNA. Current methods to separate these groups often retain a significant number of droplets that do not contain cells or empty droplets. Additionally, there are currently no methods available to detect droplets containing damaged cells, which comprise partially lysed cells, the original source of the ambient RNA.

Results: Here, we describe DropletQC, a new method that is able to detect empty droplets, damaged, and intact cells, and accurately distinguish them from one another. This approach is based on a novel quality control metric, the nuclear fraction, which quantifies for each droplet the fraction of RNA originating from unspliced, nuclear pre-mRNA. We demonstrate how DropletQC provides a powerful extension to existing computational methods for identifying empty droplets such as EmptyDrops.

Conclusions: We implement DropletQC as an R package, which can be easily integrated into existing single-cell analysis workflows.