Computational principles and challenges in single-cell data integration

Nat Biotechnol. 2021 Oct;39(10):1202-1215. doi: 10.1038/s41587-021-00895-7. Epub 2021 May 3.

Abstract

The development of single-cell multimodal assays provides a powerful tool for investigating multiple dimensions of cellular heterogeneity, enabling new insights into development, tissue homeostasis and disease. A key challenge in the analysis of single-cell multimodal data is to devise appropriate strategies for tying together data across different modalities. The term 'data integration' has been used to describe this task, encompassing a broad collection of approaches ranging from batch correction of individual omics datasets to association of chromatin accessibility and genetic variation with transcription. Although existing integration strategies exploit similar mathematical ideas, they typically have distinct goals and rely on different principles and assumptions. Consequently, new definitions and concepts are needed to contextualize existing methods and to enable development of new methods.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Chromatin / genetics
  • Chromatin / metabolism
  • Computational Biology*
  • Data Interpretation, Statistical*
  • Genetic Variation
  • Genomics
  • Humans
  • Precision Medicine
  • Single-Cell Analysis*

Substances

  • Chromatin