dropClust: efficient clustering of ultra-large scRNA-seq data

Nucleic Acids Res. 2018 Apr 6;46(6):e36. doi: 10.1093/nar/gky007.


Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cells, Cultured
  • Cluster Analysis*
  • Computational Biology / methods*
  • Gene Expression Profiling / methods*
  • HEK293 Cells
  • Humans
  • Jurkat Cells
  • Leukocytes, Mononuclear / cytology
  • Leukocytes, Mononuclear / metabolism
  • Megakaryocyte Progenitor Cells / cytology
  • Megakaryocyte Progenitor Cells / metabolism
  • RNA, Small Cytoplasmic / classification
  • RNA, Small Cytoplasmic / genetics*
  • Reproducibility of Results
  • Sequence Analysis, RNA
  • Single-Cell Analysis / methods


  • RNA, Small Cytoplasmic