Embracing the dropouts in single-cell RNA-seq analysis

Nat Commun. 2020 Mar 3;11(1):1169. doi: 10.1038/s41467-020-14976-9.

Abstract

One primary reason that makes single-cell RNA-seq analysis challenging is dropouts, where the data only captures a small fraction of the transcriptome of each cell. Almost all computational algorithms developed for single-cell RNA-seq adopted gene selection, dimension reduction or imputation to address the dropouts. Here, an opposite view is explored. Instead of treating dropouts as a problem to be fixed, we embrace it as a useful signal. We represent the dropout pattern by binarizing single-cell RNA-seq count data, and present a co-occurrence clustering algorithm to cluster cells based on the dropout pattern. We demonstrate in multiple published datasets that the binary dropout pattern is as informative as the quantitative expression of highly variable genes for the purpose of identifying cell types. We expect that recognizing the utility of dropouts provides an alternative direction for developing computational algorithms for single-cell RNA-seq analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Cluster Analysis
  • Databases, Genetic
  • Gene Ontology
  • Humans
  • Leukocytes, Mononuclear / physiology
  • Mice
  • Prefrontal Cortex / physiology
  • Sequence Analysis, RNA / methods
  • Sequence Analysis, RNA / statistics & numerical data*
  • Single-Cell Analysis / methods
  • Single-Cell Analysis / statistics & numerical data*