Identification of cell types from single-cell transcriptomes using a novel clustering method

Bioinformatics. 2015 Jun 15;31(12):1974-80. doi: 10.1093/bioinformatics/btv088. Epub 2015 Feb 11.


Motivation: The recent advance of single-cell technologies has brought new insights into complex biological phenomena. In particular, genome-wide single-cell measurements such as transcriptome sequencing enable the characterization of cellular composition as well as functional variation in homogenic cell populations. An important step in the single-cell transcriptome analysis is to group cells that belong to the same cell types based on gene expression patterns. The corresponding computational problem is to cluster a noisy high dimensional dataset with substantially fewer objects (cells) than the number of variables (genes).

Results: In this article, we describe a novel algorithm named shared nearest neighbor (SNN)-Cliq that clusters single-cell transcriptomes. SNN-Cliq utilizes the concept of shared nearest neighbor that shows advantages in handling high-dimensional data. When evaluated on a variety of synthetic and real experimental datasets, SNN-Cliq outperformed the state-of-the-art methods tested. More importantly, the clustering results of SNN-Cliq reflect the cell types or origins with high accuracy.

Availability and implementation: The algorithm is implemented in MATLAB and Python. The source code can be downloaded at

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Animals
  • Cell Lineage / genetics*
  • Cluster Analysis*
  • Embryo, Mammalian / cytology
  • Embryo, Mammalian / metabolism*
  • Gene Expression Regulation
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Mice
  • Neoplasms / genetics*
  • Programming Languages
  • Single-Cell Analysis / methods*
  • Transcriptome*