scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data

BMC Bioinformatics. 2022 Jan 17;23(1):44. doi: 10.1186/s12859-022-04574-5.

Abstract

Background: Automatic cell type identification is essential to alleviate a key bottleneck in scRNA-seq data analysis. While most existing classification tools show good sensitivity and specificity, they often fail to adequately not-classify cells that are missing in the used reference. Additionally, many tools do not scale to the continuously increasing size of current scRNA-seq datasets. Therefore, additional tools are needed to solve these challenges.

Results: scAnnotatR is a novel R package that provides a complete framework to classify cells in scRNA-seq datasets using pre-trained classifiers. It supports both Seurat and Bioconductor's SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scAnnotatR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior accuracy, sensitivity and specificity compared to existing tools while being able to not-classify unknown cell types. Moreover, scAnnotatR is the only of the best performing tools able to process datasets containing more than 600,000 cells.

Conclusions: scAnnotatR is freely available on GitHub ( https://github.com/grisslab/scAnnotatR ) and through Bioconductor (from version 3.14). It is consistently among the best performing tools in terms of classification accuracy while scaling to the largest datasets.

Keywords: Bioconductor; Cell classification; Machine learning; R; SVM; scAnnotatR; scRNAseq.

MeSH terms

  • RNA* / genetics
  • Sequence Analysis, RNA
  • Single-Cell Analysis*
  • Whole Exome Sequencing

Substances

  • RNA