Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Genome Biol. 2020 Jan 20;21(1):9. doi: 10.1186/s13059-019-1900-3.

Abstract

Background: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory.

Results: In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.

Conclusion: We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

Keywords: Cellular heterogeneity; Dimension reduction; Julia; Online/incremental algorithm; Out-of-core; Principal component analysis; Python; R; Randomized algorithm; Single-cell RNA-seq; Sparse data format.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Benchmarking
  • Principal Component Analysis*
  • RNA-Seq / methods*
  • Single-Cell Analysis / methods*