Background: Automatic cell type identification is essential to alleviate a key bottleneck in scRNA-seq data analysis. While most existing classification tools show good sensitivity and specificity, they often fail to adequately not-classify cells that are missing in the used reference. Additionally, many tools do not scale to the continuously increasing size of current scRNA-seq datasets. Therefore, additional tools are needed to solve these challenges.
Results: scAnnotatR is a novel R package that provides a complete framework to classify cells in scRNA-seq datasets using pre-trained classifiers. It supports both Seurat and Bioconductor's SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scAnnotatR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior accuracy, sensitivity and specificity compared to existing tools while being able to not-classify unknown cell types. Moreover, scAnnotatR is the only of the best performing tools able to process datasets containing more than 600,000 cells.
Conclusions: scAnnotatR is freely available on GitHub ( https://github.com/grisslab/scAnnotatR ) and through Bioconductor (from version 3.14). It is consistently among the best performing tools in terms of classification accuracy while scaling to the largest datasets.
Keywords: Bioconductor; Cell classification; Machine learning; R; SVM; scAnnotatR; scRNAseq.
© 2022. The Author(s).