A clustering-independent method for finding differentially expressed genes in single-cell transcriptome data

Nat Commun. 2020 Aug 28;11(1):4318. doi: 10.1038/s41467-020-17900-3.

Abstract

A common analysis of single-cell sequencing data includes clustering of cells and identifying differentially expressed genes (DEGs). How cell clusters are defined has important consequences for downstream analyses and the interpretation of results, but is often not straightforward. To address this difficulty, we present singleCellHaystack, a method that enables the prediction of DEGs without relying on explicit clustering of cells. Our method uses Kullback-Leibler divergence to find genes that are expressed in subsets of cells that are non-randomly positioned in a multidimensional space. Comparisons with existing DEG prediction approaches on artificial datasets show that singleCellHaystack has higher accuracy. We illustrate the usage of singleCellHaystack through applications on 136 real transcriptome datasets and a spatial transcriptomics dataset. We demonstrate that our method is a fast and accurate approach for DEG prediction in single-cell data. singleCellHaystack is implemented as an R package and is available from CRAN and GitHub.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bone Marrow
  • Cluster Analysis
  • Computational Biology / methods*
  • Data Mining
  • Gene Expression
  • Gene Expression Profiling / methods*
  • Gene Regulatory Networks
  • Single-Cell Analysis / methods
  • Software
  • Transcriptome*