Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 14 (5), 483-486

SC3: Consensus Clustering of Single-Cell RNA-seq Data

Affiliations

SC3: Consensus Clustering of Single-Cell RNA-seq Data

Vladimir Yu Kiselev et al. Nat Methods.

Abstract

Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.

Conflict of interest statement

Competing Financial Interests

No competing financial interests.

Figures

Figure 1
Figure 1. The SC3 framework for consensus clustering.
(a) Overview of clustering with SC3 framework (see Methods). The consensus step is exemplified using the Treutlein data. (b) Published datasets used to set SC3 parameters. N is the number of cells in a dataset; k is the number of clusters originally identified by the authors; Units: RPKM is Reads Per Kilobase of transcript per Million mapped reads, RPM is Reads Per Million mapped reads, FPKM is Fragments Per Kilobase of transcript per Million mapped reads, TPM is Transcripts Per Million mapped reads. (c) Histogram of the d values where ARI>.95 is achieved for the gold standard datasets. The black vertical lines indicate the interval d = 4-7% of the total number of cells N, showing high accuracy in the classification. (d) 100 realizations of the SC3 clustering of the datasets shown in (b). Dots represent individual clustering runs. Bars correspond to the median of the dots. Red and grey colours correspond to clustering with and without consensus step. The black line corresponds to ARI=0.8. The dashed black line separates gold and silver standard datasets.
Figure 2
Figure 2. Benchmarking of SC3 against existing methods.
(a) SC3, tSNE+kmeans and pcaReduce were applied 100 times to each dataset. SNN-Cliq and SINCERA are deterministic and were run only once. SEURAT was also run once, however was optimised over different values of the density parameter G (Methods). Each panel shows the ARI (black dots, Methods) between the inferred clusterings and the reference labels. Bars correspond to the median of the dots. For the Pollen and Usoskin datasets all different hierarchies were considered (Data Avaialbility). The black line indicates ARI = 0.8. The dashed black line separates gold and silver standard datasets. (b) Number of clusters k^ predicted by SC3, SINCERA and SNN-Cliq for all datasets. Ref is the reference clustering reported by the authors. (c) The performance of the hybrid SC3 (Methods). Dots represent outliers higher (lower) than the highest (lowest) value within 1.5 x IQR, where IQR is the interquartile range. The black line indicates ARI = 0.8. The dashed black line in the legend separates gold and silver standard datasets. (d) The consensus matrix as generated by SC3 for the Deng dataset (Methods). The matrix indicates how often each pair of cells was assigned to the same cluster by the different parameter combinations as indicated by the colorbar (1 - always, 0 - never). SC3 finds a clustering with k = 10 clusters, separated by the white lines as visual guides. The colors at the top represent the reference labels, corresponding to different stages of development (see colour guide).
Figure 3
Figure 3. Using SC3 to define subclones from two patients with myeloproliferative neoplasm.
Marker gene expression matrix (after Gene Filter and Log-transformation, Methods) of the combined dataset (patient 1 + patient 2). Clusters (separated by white vertical lines) correspond to k = 3 (Methods). Only the top 10 marker genes are shown for each cluster.

Similar articles

See all similar articles

Cited by 174 articles

See all "Cited by" articles

References

    1. Grün D, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–255. - PubMed
    1. Jaitin DA, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science. 2014;343:776–779. - PMC - PubMed
    1. Mahata B, et al. Single-cell RNA sequencing reveals T helper cells synthesizing steroids de novo to contribute to immune homeostasis. Cell Rep. 2014;7:1130–1142. - PMC - PubMed
    1. Gentleman RC, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. - PMC - PubMed
    1. McCarthy DJ, Campbell KR, Lun ATL, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics. 2017 doi: 10.1093/bioinformatics/btw777. - DOI - PMC - PubMed

MeSH terms

Feedback