Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 21;161(5):1187-1201.
doi: 10.1016/j.cell.2015.04.044.

Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells

Free PMC article

Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells

Allon M Klein et al. Cell. .
Free PMC article


It has long been the dream of biologists to map gene expression at the single-cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single-cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after leukemia inhibitory factor (LIF) withdrawal. The reproducibility of these high-throughput single-cell data allowed us to deconstruct cell populations and infer gene expression relationships. VIDEO ABSTRACT.


Figure 1
Figure 1. A platform for DNA barcoding thousands of cells
Cells are encapsulated into droplets with lysis buffer, reverse-transcription mix, and hydrogel microspheres carrying barcoded primers. After encapsulation primers are released. cDNA in each droplet is tagged with a barcode during reverse transcription. Droplets are then broken and material from all cells is linearly amplified before sequencing. UMI = unique molecular identifier.
Figure 2
Figure 2. Barcoding hydrogel microsphere synthesis
A) Microfluidic preparation of hydrogel microspheres containing a common DNA. Scale bars 100 μm. B) The common DNA primer: acrylic phosphoroamidite moiety (blue), photo-cleavable spacer (green), T7 RNA polymerase promoter sequence (red) and sequencing primer (blue). C,D) Method for combinatorial barcoding of the microspheres. E) The fully assembled primer: T7 promoter (red), sequencing primer (blue), barcodes (green), synthesis adaptor (dark brown), UMI (yellow) and poly-T primer (purple). See also Fig. S1.
Figure 3
Figure 3. A droplet barcoding device
A) Microfluidic device design, see also Fig. S2. B,C) Snapshots of encapsulation (left) and collection (right) modules, see also Movies S1,S2. Arrows indicate cells (red), hydrogels (blue), and flow direction (black). Scale bars 100μm. D) Droplet occupancy over time. E) Cell and hydrogel co-encapsulation statistics showing a high 1:1 cell:hydrogel correspondence. F) BioAnalyzer traces showing dependence of library abundance on primer photo-release. H) Number of cells/controls as a function of collection volume.
Figure 4
Figure 4. Technical noise in droplet barcoding
A) Droplet integrity control: mouse and human cells are co-encapsulated to allow unambiguous identification of barcodes shared across multiple cells; 4% of barcodes share mixed mouse/human reads. B) inDrops technical control schematic, and histogram of UMI-filtered mapped (UMIFM) reads per droplet. C) Unique gene symbols detected as a function of UMIFM reads per droplet. D) Mean UMIFM reads for spike-in molecules are linearly related to their input concentration, with a capture efficiency β=7.1%. E) Method sensitivity S as a function of input RNA abundance; red curve is the sensitivity limit of binomial sampling (S = 1 − eβx). F) CV-mean plot of pure RNA after normalization. Data points correspond to individual gene symbols; solid curve is the binomial sampling noise limit. For abundant transcripts, droplet-to-droplet variability in method efficiency β sets a baseline CV (dashed curve: CVβ=5%), see also Fig. S3. G) Relationships between observed and biological values of gene CVs, Fano Factors and correlations, showing how low efficiency dampens Fano Factors (Eq. 2) and weakens correlations (Eq. 3).
Figure 5
Figure 5. inDrop sequencing reveals ES cell population structure
A) CV-mean plot of the ES cell transcriptome. Pure RNA control (blue); genes significantly more variable than control (black). Solid and dashed curves are as in Fig. 4F [variability in cell size = 20%, see Theory Eq. (S4) in Supplemental Information]. Inset: gene CVs of two technical replicate cell populations (total n=5,956 cells), see also Fig. S4. B) Illustrative transcript counts showing low (Ttn), moderate (Trim28, Ly6a, Dppa5a) and high (Sparc, S100a6) expression variability; curve fits are Poisson (red) and Negative Binomial (blue) distributions. C) Above-Poisson (a.p.) noise, (CV2-1/mean) of pluripotency differentiation markers. D) Co-expression plots recapitulating known and novel gene expression relationships (see main text). E) The eigenvalue distribution of cell principal components (PC) reveals the number of non-trivial PCs detectable in the data (arrows), compared to eigenvalue distribution of randomized data (black) and to the Marcenko-Pastur distribution for a random matrix (red). F) The first four ES cell PCs and their coefficients, revealing three outlier populations. G) ES cell tSNE map revealing an axis of pluripotency-to-differentiation with fringe sub-populations at different points on the differentiation axis (see also Fig. S6). Top panel shows sub-populations visible in one projection. Lower panels show cells colored by abundance of specified gene sets (see Table S4).
Figure 6
Figure 6. Regulatory information preserved in gene correlations
A) A strategy for inferring robust gene associations from cell-to-cell variability with weak and/or highly connected gene correlations, see also Fig. S6. B-D) Gene neighborhoods of Nanog, Sox2, and Cyclin B. Grey boxes mark validated pluripotency factors; blue boxes mark factors previously associated with a pluripotent state. E,F) Correlations of 44 cell cycle-regulated transcripts in a somatic cell line (K562) and in mouse ES cells shows a loss of cell cycle dependent transcription in ES cells (gene names in Fig. S6). Genes are ordered by hierarchical clustering. Color scale applies to (E,F).
Figure 7
Figure 7. Heterogeneity in differentiating ES cells
A) Changes in global population structure after LIF withdrawal seen by hierarchically clustering cell-cell correlations over highly variable genes. B,C) Average (B) and distribution (C) of gene expression after LIF withdrawal; violin plots in (C) indicate the fraction of cells expressing a given number of counts; points show top 5% of cells. D,E) First two PCs of 3,034 cells showing asynchrony in differentiation. F) Epiblast and PrEn cell fractions as a function of time. G) tSNE maps of differentiating ES cells, and of genes (right panel) reveal putative population markers (see also Fig. S7 and Table S4). H) Intrinsic dimensionality of gene expression variability in ES cells and following LIF withdrawal, showing a smaller fluctuation sub-space during differentiation. The pure RNA control lacks correlations and displays a maximal fluctuation sub-space.

Comment in

Similar articles

See all similar articles

Cited by 597 articles

See all "Cited by" articles

Publication types

MeSH terms

Associated data

LinkOut - more resources