Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar;13(3):241-4.
doi: 10.1038/nmeth.3734. Epub 2016 Jan 18.

Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

Affiliations

Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

Jean Fan et al. Nat Methods. 2016 Mar.

Abstract

The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging. We developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Pathway and gene set overdispersion analysis (PAGODA)
Transcriptional heterogeneity analyzed through the following key steps: 1. Error models are fit for each cell to quantify the dependency of amplification noise and drop-out probabilities on the expression magnitude. A model fit for a cell is shown, separating drop-out and amplified components, and the 95% confidence envelope of the amplified component; 2. The residual expression variance magnitude for each gene is determined relative to the transcriptome-wide expectation model (red curve), taking into account the uncertainty in the variance estimates of each gene by determining effective degrees of freedom (kg) for the χ2 distribution; 3. Weighted PCA analysis is performed independently on functionally-annotated gene sets, as well as de novo gene sets determined based on correlated expression in the current dataset; 4. If the amount of variance explained by a principal component of a gene set is significantly higher than expected, the gene set is called overdispersed, and the cell scores defined by that principal component (coded in orange-green gradient) are included as one of the significant aspects of heterogeneity; 5. Redundant aspects that are driven by the same genes or show similar patterns of cell separation are grouped to provide succinct overview of heterogeneity; 6. A web browser-based interface is used to navigate the identified aspects of heterogeneity, associated gene sets and gene expression patterns. 7. Depending on the biological question, some of the detected aspects of heterogeneity may be deemed artifactual or extraneous, and can be actively controlled for in a subsequent iteration.
Figure 2
Figure 2. PAGODA analysis of the 3,005 cells from mouse cortex and hippocampus measured by Zeisel et al.
The dendrogram shows the overall clustering of the cells, with the row immediately below specifying the group to which each cell was assigned in the original analysis by Zeisel et al. The main panel shows the top 9 significant aspects (P < 0.05) of heterogeneity (rows) detected by PAGODA based on gene sets defined by GO annotations, with the orange/white/green gradient indicating high/neutral/low score of a cell with respect to a given aspect. The aspect scores are oriented so that high (orange) and low (green) values generally correspond, respectively, to increased and decreased expression of the associated gene sets. Row labels summarize the key functional annotations of the gene sets in each aspect. Two subsequent panels show expression patterns of top-loading genes innate immune response (from the aspect distinguishing neuroglia), and myelin sheath (distinguishing oligodendrocytes). A population of ~35 cells expressing both signatures is marked by a green bar, and most likely represents capture of two associated cells of different type. The bottom panel shows images of the microfluidic traps corresponding to some of the dual-signature cells, along with cells (leftmost two) exhibiting only the oligodendrocyte signature. Green boxes below the main panel highlight cells showing a combination of the oligodendrocyte signature with other cell types (numbered 1–5: vascular endothelial, astrocytes, CA1 neurons, Gad1/2 interneurons and neuroglia). Detailed composition is available through an interactive online view.
Figure 3
Figure 3. Transcriptional heterogeneity of 65 neuronal progenitor cells in embryonic mouse cortex
a. Top eight significant (P < 0.01) aspects of heterogeneity are shown, labeled by their primary GO category or driving genes. Detailed are available through an online browser. Top aspect tracks induction of neuronal maturation pathways, driving the overall subpopulation structure. Mitotic and S-phase signatures in early NPCs account for the next two most significant aspects, with the S-phase aspect incorporating closely matching expression patterns of genes responsible for NPC maintenance. Color codes in the top panel summarize key subpopulations of NPCs distinguished by the detected heterogeneity aspects. b. Anatomical placement of the early vs. maturing NPC classes within embryonic brain. In situ hybridization signals in E13.5 mouse brain are shown for Tyro3 and Nfasc, with the two heatmap rows above showing their expression in the scRNA-seq. Computational prediction (third panel) based on the overall transcriptional profile places early NPCs near VZ, and maturing ones in SVZ (subventricular zone)/CP regions. In situ images were generated by Allen Institute for Brain Science. The lower panel shows anatomical placement of the Dlx-expressing NPCs, and in situ images for the associated genes. c. Validation of genes associated with specific subpopulations by in situ hybridization. Coronal E13.5 brain sections labeled using RNAscope probes for Rpa1 (left) and Ndn (right). Rpa1 showed high expression in the ventricular (VZ) and sub-ventricular zone (SVZ). Ndn, which is marks a distinct subpopulation of both mature and early NPCs, shows prominent expression throughout the CP, with rarer high expressing cells in the VZ and SVZ (black arrows).

Similar articles

Cited by

References

    1. Islam S, et al. Nat Methods. 2014;11:163–166. - PubMed
    1. Picelli S, et al. Nat Methods. 2013;10:1096–1098. - PubMed
    1. Tang F, et al. PLoS One. 2011;6:e21208. - PMC - PubMed
    1. Yan L, et al. Nat Struct Mol Biol. 2013;20:1131–1139. - PubMed
    1. Jaitin DA, et al. Science. 2014;343:776–779. - PMC - PubMed

Extended References

    1. Kawaguchi A, et al. Development. 2008;135:3113–3124. - PubMed
    1. Kriegstein A, Noctor S, Martinez-Cerdeno V. Nat Rev Neurosci. 2006;7:883–890. - PubMed
    1. Lein ES, et al. Nature. 2007;445:168–176. - PubMed
    1. Englund C, et al. J Neurosci. 2005;25:247–251. - PMC - PubMed
    1. Uetsuki T, Takagi K, Sugiura H, Yoshikawa K. J Biol Chem. 1996;271:918–924. - PubMed

Publication types