Single-cell RNA sequencing (scRNA-seq) data exhibit an unusual abundance of zero counts with a considerable fraction due to the dropout events, which introduces challenges to differential expression analysis. To correct biases in differential expression due to the informative dropouts, an inverse non-dropout-probability weighting method is proposed given that the dropout rate is negatively dependent on the underlying gene expression magnitude in scRNA-seq data. The weights are estimated using the maximum likelihood method where dropout values are integrated out using the Gauss-Hermite quadrature. Linear, generalized linear and mixed regressions with the estimated weights are fitted on original or transformed scRNA-seq data. Variances of coefficient estimators from the weighted regressions are estimated using the jackknife method. Extensive simulation studies are carried out to compare the proposed method to five cutting-edge methods (Limma, edgeR, MAST, ZIAQ and scImpute), where the proposed method performs among the best under all scenarios in terms of AUC, sensitivity, specificity and FDR. Rate of detecting true positives is examined for the proposed method and five comparison methods using mouse embryonic stem cells and fibroblasts where differentially expressed (DE) genes detected in bulk RNA-seq data on the same set of genes under the same conditions from independent source serve as true positives. Specificity is compared for these methods on true negative data by random splitting of a real dataset. Furthermore, the proposed method is illustrated on a lineage study where cells in the same embryo are correlated and genes differentially expressed between cell division lineages are identified.
Keywords: Differential expression; Informative dropout; Inverse probability weighting; Jackknife; Single-cell RNA sequencing data.
Published by Elsevier Ltd.