Mathematical model for the relationship between single-cell and bulk gene expression to clarify the interpretation of bulk gene expression data

Comput Struct Biotechnol J. 2022 Sep 5;20:4850-4859. doi: 10.1016/j.csbj.2022.08.062. eCollection 2022.

Abstract

Background: Differential expression analysis is a standard approach in molecular biology. For example, genes whose expression levels differ between diseased and non-diseased samples are considered to be associated with that disease. On the other hand, differential variability analysis focuses on the differences of the variances of gene expression between sample groups. Although differential variability is also known to capture biological information, its interpretation remains unclear and controversial. Recent single-cell analyses have revealed that differences between sample groups can affect gene expression in a cellular subset-specific manner or by altering the proportion of a particular cellular subset. The aim of this study is to clarify the interpretation of mean and variance of bulk gene expression data.

Method: We developed a mathematical model in which the bulk gene expression value is proportional to the mean value of the single-cell gene expression profile. Based on this model, we performed theoretical, simulated and real single-cell RNA-seq data analyses.

Result and conclusion: We identified how differences in single-cell gene expression profiles affect the differences in the mean and the variance of bulk gene expression. It is shown that differential expression analysis of bulk expression data can overlook significant changes in gene expression at the single-cell level. Further, differential variability analysis capture the complex feature affected by different gene expression shifts for each subset, changes in the proportions of cellular subsets, and variation in single-cell distribution parameters among samples.

Keywords: Cellular heterogeneity; Differential expression analysis; Differential variability analysis; Gene expression; Probability distribution; Single cell.