Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Brief Bioinform. 2020 Sep 25;21(5):1581-1595. doi: 10.1093/bib/bbz096.

Abstract

Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning 'unassigned' labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Keywords: benchmark; classification; comparative analysis; single-cell RNA-seq.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Datasets as Topic
  • HEK293 Cells
  • Humans
  • K562 Cells
  • Leukocytes, Mononuclear / metabolism
  • Pancreas / metabolism
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Software