Single-Cell Multiomics Integration by SCOT

J Comput Biol. 2022 Jan;29(1):19-22. doi: 10.1089/cmb.2021.0477. Epub 2022 Jan 5.

Abstract

Although the availability of various sequencing technologies allows us to capture different genome properties at single-cell resolution, with the exception of a few co-assaying technologies, applying different sequencing assays on the same single cell is impossible. Single-cell alignment using optimal transport (SCOT) is an unsupervised algorithm that addresses this limitation by using optimal transport to align single-cell multiomics data. First, it preserves the local geometry by constructing a k-nearest neighbor (k-NN) graph for each data set (or domain) to capture the intra-domain distances. SCOT then finds a probabilistic coupling matrix that minimizes the discrepancy between the intra-domain distance matrices. Finally, it uses the coupling matrix to project one single-cell data set onto another through barycentric projection, thus aligning them. SCOT requires tuning only two hyperparameters and is robust to the choice of one. Furthermore, the Gromov-Wasserstein distance in the algorithm can guide SCOT's hyperparameter tuning in a fully unsupervised setting when no orthogonal alignment information is available. Thus, SCOT is a fast and accurate alignment method that provides a heuristic for hyperparameter selection in a real-world unsupervised single-cell data alignment scenario. We provide a tutorial for SCOT and make its source code publicly available on GitHub.

Keywords: data integration; manifold alignment; multiomics; optimal transport; single-cell genomics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computational Biology
  • Databases, Genetic / statistics & numerical data
  • Genomics / statistics & numerical data
  • Heuristics
  • Humans
  • Neural Networks, Computer
  • Sequence Alignment / statistics & numerical data*
  • Sequence Analysis / statistics & numerical data
  • Single-Cell Analysis / statistics & numerical data*
  • Software
  • Unsupervised Machine Learning