Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 10;16:278.
doi: 10.1186/s13059-015-0844-5.

MAST: A Flexible Statistical Framework for Assessing Transcriptional Changes and Characterizing Heterogeneity in Single-Cell RNA Sequencing Data

Affiliations
Free PMC article

MAST: A Flexible Statistical Framework for Assessing Transcriptional Changes and Characterizing Heterogeneity in Single-Cell RNA Sequencing Data

Greg Finak et al. Genome Biol. .
Free PMC article

Abstract

Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST .

Figures

Fig. 1
Fig. 1
Cellular detection rate correlates with the first two principal components of variation. The fraction of genes expressed, or cellular detection rate (CDR) correlates mostly with the a,c) first principal component (PC) of variation in the myeloid dendritic cells (DC) data set and mostly with the second PC in the b,d) mucosal-associated invariant T (MAIT) data set
Fig. 2
Fig. 2
Single-cell expression (log2-transcripts per million) of the top 100 genes identified as differentially expressed between cytokine (IL18, IL15, IL12)-stimulated (purple) and non-stimulated (pink) MAIT cells using MAST (a). Partial residuals for up-regulated and down- regulated genes are accumulated to yield an activation score (b), and this score suggests that the stimulated cells have a more heterogeneous response to stimulation than do the non-stimulated cells
Fig. 3
Fig. 3
Module scores for individual cells for the top nine enriched modules (a) and decomposed Z-scores (b) for single-cell gene set enrichment analysis in the MAIT data set, using the blood transcription modules (BTM) database. The distribution of module scores suggests heterogeneity among individual cells with respect to different biological processes. Enrichment of modules in stimulated and non-stimulated cells is due to a combination of differences in the discrete (proportion) and continuous (mean conditional expression) components of genes in modules. The combined Z-score reflects the enrichment due to differences in the continuous and discrete components
Fig. 4
Fig. 4
Gene–gene correlation (Pearson’s rho) of model residuals in non-stimulated (a) and stimulated (b) cells, and a principal components analysis biplot of model residuals (c) on both populations using the top 50 marginally differentially expressed genes. As marginal changes in the genes attributable to stimulation and CDR have been removed, clustering of subpopulations in (c) indicates co-expression of the indicated genes on a cellular basis. PC principal component
Fig. 5
Fig. 5
Module scores (a) and decomposed Z-scores (b) for single-cell gene set enrichment analysis for lipopolysaccharide (LPS)-stimulated myeloid dendritic cells (mDC data set), using the mouse gene ontology (GO) biological process database. The change in single-cell module scores over time for the nine most significantly enriched modules in response to LPS stimulation are shown in (a). The “core antiviral”, “peaked inflammatory,” and “sustained inflammatory” modules are among the top enriched modules, consistent with the original publication. Additionally, we identified the GO modules “cellular response to interferon-beta” and “response to virus,” which behave analogously to the core antiviral and sustained inflammatory modules. No GO analog for the “peaked inflammatory” module was detected. The majority of modules detected exhibited enrichment relative to the 1 h time point (thus increasing with time). The “early marcher” cells identified in the original publication are highlighted here with triangles. We show the top 50 most significant modules (b). The combined Z-score summarizes the changes in the discrete and continuous components of expression
Fig. 6
Fig. 6
Principal components analysis biplot of model residuals (a) and gene–gene correlation (Pearson’s rho) of model residuals (b) by time point for lipopolysaccharide-induced myeloid dendritic cells (mDC data set) using 20 genes with the largest log-fold changes, given significant (false discovery rate q < 0.01) marginal changes in expression. Principle component 1 (PC1) is correlated with change over time. The two “early marcher” cells are highlighted by an asterisk at the 1 h time point. Correlation structure in the residuals is increasingly evident over time and can be clearly observed at the 6 h time point compared to the earlier time points

Similar articles

See all similar articles

Cited by 218 articles

See all "Cited by" articles

References

    1. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002;297:1183–6. doi: 10.1126/science.1070919. - DOI - PubMed
    1. Raj A, van den Bogaard P, Rifkin SA, van Oudenaarden A, Tyagi S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods. 2008;5:877–9. doi: 10.1038/nmeth.1253. - DOI - PMC - PubMed
    1. Sanchez A, Golding I. Genetic determinants and cellular constraints in noisy gene expression. Science. 2013;342:1188–93. doi: 10.1126/science.1242975. - DOI - PMC - PubMed
    1. McDavid A, Finak G, Chattopadyay PK, Dominguez M, Lamoreaux L, Ma SS, et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics. 2013;29:461–7. doi: 10.1093/bioinformatics/bts714. - DOI - PMC - PubMed
    1. Shalek AK, Satija R, Shuga J, Trombetta JJ, Gennert D, Lu D, et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature. 2014;510:263–9. doi: 10.1038/nature13235. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback