Background: Single cell RNA sequencing (scRNAseq) has provided invaluable insights into cellular heterogeneity and functional states in health and disease. During the analysis of scRNAseq data, annotating the biological identity of cell clusters is an important step before downstream analyses and it remains technically challenging. The current solutions for annotating single cell clusters generally lack a graphical user interface, can be computationally intensive or have a limited scope. On the other hand, manually annotating single cell clusters by examining the expression of marker genes can be subjective and labor-intensive. To improve the quality and efficiency of annotating cell clusters in scRNAseq data, we present a web-based R/Shiny app and R package, Cluster Identity PRedictor (CIPR), which provides a graphical user interface to quickly score gene expression profiles of unknown cell clusters against mouse or human references, or a custom dataset provided by the user. CIPR can be easily integrated into the current pipelines to facilitate scRNAseq data analysis.
Results: CIPR employs multiple approaches for calculating the identity score at the cluster level and can accept inputs generated by popular scRNAseq analysis software. CIPR provides 2 mouse and 5 human reference datasets, and its pipeline allows inter-species comparisons and the ability to upload a custom reference dataset for specialized studies. The option to filter out lowly variable genes and to exclude irrelevant reference cell subsets from the analysis can improve the discriminatory power of CIPR suggesting that it can be tailored to different experimental contexts. Benchmarking CIPR against existing functionally similar software revealed that our algorithm is less computationally demanding, it performs significantly faster and provides accurate predictions for multiple cell clusters in a scRNAseq experiment involving tumor-infiltrating immune cells.
Conclusions: CIPR facilitates scRNAseq data analysis by annotating unknown cell clusters in an objective and efficient manner. Platform independence owing to Shiny framework and the requirement for a minimal programming experience allows this software to be used by researchers from different backgrounds. CIPR can accurately predict the identity of a variety of cell clusters and can be used in various experimental contexts across a broad spectrum of research areas.
Keywords: Cluster analysis; Gene expression profiling; Identity prediction; Immune cells; Similarity; Single cell RNA-sequencing.
Conflict of interest statement
The authors declare that they have no competing interests.
scClustViz - Single-cell RNAseq cluster assessment and visualization.Version 2. F1000Res. 2018 Sep 21;7:ISCB Comm J-1522. doi: 10.12688/f1000research.16198.2. eCollection 2018. F1000Res. 2018. PMID: 31016009 Free PMC article.
GENAVi: a shiny web application for gene expression normalization, analysis and visualization.BMC Genomics. 2019 Oct 16;20(1):745. doi: 10.1186/s12864-019-6073-7. BMC Genomics. 2019. PMID: 31619158 Free PMC article.
GRcalculator: an online tool for calculating and mining dose-response data.BMC Cancer. 2017 Oct 24;17(1):698. doi: 10.1186/s12885-017-3689-3. BMC Cancer. 2017. PMID: 29065900 Free PMC article.
Experimental Considerations for Single-Cell RNA Sequencing Approaches.Front Cell Dev Biol. 2018 Sep 4;6:108. doi: 10.3389/fcell.2018.00108. eCollection 2018. Front Cell Dev Biol. 2018. PMID: 30234113 Free PMC article. Review.
RNAseqPS: A Web Tool for Estimating Sample Size and Power for RNAseq Experiment.Cancer Inform. 2014 Oct 13;13(Suppl 6):1-5. doi: 10.4137/CIN.S17688. eCollection 2014. Cancer Inform. 2014. PMID: 25374457 Free PMC article. Review.