Deconvolution analysis of cell-type expression from bulk tissues by integrating with single-cell expression reference

Genet Epidemiol. 2022 Jul 5. doi: 10.1002/gepi.22494. Online ahead of print.

Abstract

To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell-type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell-type-specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk-profiled samples by using reference single-cell (sc) RNA-sequencing (RNA-seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed-effect models provide similar results as an elegant machine learning algorithm named cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well-known and reliable procedure to reconstruct cell-type abundances and cell-type-specific gene expression profiles. In real data analysis, the mixed-effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA-seq data, it is better to use mixed models with heterogeneous mean and variance-covariance. As a byproduct, the mixed models provide fractions of covariance between subject-specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA-seq data.

Keywords: bulk tissues; cellular abundances; cellular expression patterns; mixed-effect models; scRNA-seq.