Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA Sequencing Data

J Comput Biol. 2019 Aug;26(8):782-793. doi: 10.1089/cmb.2018.0255. Epub 2019 May 1.

Abstract

The development of single cell RNA sequencing (scRNA-seq) has enabled innovative approaches to investigating mRNA abundances. In our study, we are interested in extracting the systematic patterns of scRNA-seq data in an unsupervised manner; thus, we have developed two extensions of robust principal component analysis (RPCA). First, we present a truncated version of RPCA (tRPCA), which is much faster and memory efficient. Second, we introduce a noise reduction in tRPCA with L2 regularization. Unlike RPCA that only considers a low-rank L and sparse S matrices, the proposed method can also extract a noise E matrix inherent in modern genomic data. We demonstrate its usefulness by applying our methods on the peripheral blood mononuclear cell scRNA-seq data. Particularly, the clustering of a low-rank L matrix showcases better classification of unlabeled single cells. Overall, the proposed variants are well suited for high-dimensional and noisy data that are routinely generated in genomics.

Keywords: matrix decomposition; principal component analysis; robust PCA; single cell RNA-seq; truncated singular value decomposition; unsupervised learning.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Databases, Nucleic Acid*
  • Humans
  • Sequence Analysis, RNA*
  • Single-Cell Analysis*