Single-cell RNA sequencing (scRNA-seq) is one of the most efficient technologies for human tumor research. However, data analysis is still faced with technical challenges, especially the difficulty in efficiently and accurately discriminating cancer/normal cells in the scRNA-seq expression matrix. If we can address these challenges, we can have a deeper understanding of the intratumoral and intertumoral heterogeneity. In this study, we developed a cancer/normal cell discrimination pipeline called pan-Cancer Seeker (CaSee) devoted to scRNA-seq expression matrix, which is based on the traditional high-quality pan-cancer bulk sequencing data using transfer learning. CaSee is the first tool directly used to discriminate cancer/normal cells in the scRNA-seq expression matrix, with much wider application fields and higher efficiency than copy number variation (CNV) method which requires corresponding reference cells. CaSee is user-friendly and can adapt to a variety of data sources, including but not limited to scRNA tissue sequencing data, scRNA cell line sequencing data, scRNA xenograft cell sequencing data and scRNA circulating tumor cell sequencing data. It is compatible with mainstream sequencing technology platforms, 10× Genomics Chromium, Smart-seq2, and Microwell-seq. Here, CaSee pipeline exhibited excellent performance in the multicenter data evaluation of 11 retrospective cohorts and one independent dataset, with an average discrimination accuracy of 96.69%. In general, the development of a deep-learning based, pan-cancer cell discrimination model, CaSee, to distinguish cancer cells from normal cells will be compelling to researchers working in the genomics, cancer, and single-cell fields.
© 2022. The Author(s), under exclusive licence to Springer Nature Limited.