Clustering Single-Cell Expression Data Using Random Forest Graphs

IEEE J Biomed Health Inform. 2017 Jul;21(4):1172-1181. doi: 10.1109/JBHI.2016.2565561. Epub 2016 May 10.

Abstract

Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.

MeSH terms

  • Algorithms*
  • Animals
  • Bone Marrow Cells / cytology
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Factual
  • Decision Trees
  • Gene Expression Profiling / methods*
  • Humans
  • Machine Learning
  • Mice
  • Single-Cell Analysis / methods*