An introduction to representation learning for single-cell data analysis

Cell Rep Methods. 2023 Aug 2;3(8):100547. doi: 10.1016/j.crmeth.2023.100547. eCollection 2023 Aug 28.

Abstract

Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.

Keywords: deep learning; dimension reduction; hyperparameter; manifold learning; omics; systems microscopy.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Analysis
  • Humans
  • Law Enforcement*
  • Learning*
  • Neural Networks, Computer
  • Research Personnel