Statistics or biology: the zero-inflation controversy about scRNA-seq data

Genome Biol. 2022 Jan 21;23(1):31. doi: 10.1186/s13059-022-02601-5.


Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Review

MeSH terms

  • Benchmarking*
  • Biology
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Whole Exome Sequencing