Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features

Brief Bioinform. 2021 Sep 21;bbab366. doi: 10.1093/bib/bbab366. Online ahead of print.

Abstract

With the rapid development of single-cell sequencing techniques, several large-scale cell atlas projects have been launched across the world. However, it is still challenging to integrate single-cell RNA-seq (scRNA-seq) datasets with diverse tissue sources, developmental stages and/or few overlaps, due to the ambiguity in determining the batch information, which is particularly important for current batch-effect correction methods. Here, we present SCORE, a simple network-based integration methodology, which incorporates curated molecular network features to infer cellular states and generate a unified workflow for integrating scRNA-seq datasets. Validating on real single-cell datasets, we showed that regardless of batch information, SCORE outperforms existing methods in accuracy, robustness, scalability and data integration.

Keywords: data integration; molecular network; protein–protein interaction; single-cell RNA-seq.