scDEA: differential expression analysis in single-cell RNA-sequencing data via ensemble learning

Brief Bioinform. 2021 Sep 25;bbab402. doi: 10.1093/bib/bbab402. Online ahead of print.

Abstract

The identification of differentially expressed genes between different cell groups is a crucial step in analyzing single-cell RNA-sequencing (scRNA-seq) data. Even though various differential expression analysis methods for scRNA-seq data have been proposed based on different model assumptions and strategies recently, the differentially expressed genes identified by them are quite different from each other, and the performances of them depend on the underlying data structures. In this paper, we propose a new ensemble learning-based differential expression analysis method, scDEA, to produce a more stable and accurate result. scDEA integrates the P-values obtained from 12 individual differential expression analysis methods for each gene using a P-value combination method. Comprehensive experiments show that scDEA outperforms the state-of-the-art individual methods with different experimental settings and evaluation metrics. We expect that scDEA will serve a wide range of users, including biologists, bioinformaticians and data scientists, who need to detect differentially expressed genes in scRNA-seq data.

Keywords: differential expression analysis; ensemble learning; scRNA-seq.