Robust classification of single-cell transcriptome data by nonnegative matrix factorization
- PMID: 27663498
- DOI: 10.1093/bioinformatics/btw607
Robust classification of single-cell transcriptome data by nonnegative matrix factorization
Abstract
Motivation: Single-cell transcriptome data provide unprecedented resolution to study heterogeneity in cell populations and present a challenge for unsupervised classification. Popular methods, like principal component analysis (PCA), often suffer from the high level of noise in the data.
Results: Here we adapt Nonnegative Matrix Factorization (NMF) to study the problem of identifying subpopulations in single-cell transcriptome data. In contrast to the conventional gene-centered view of NMF, identifying metagenes, we used NMF in a cell-centered direction, identifying cell subtypes ('metacells'). Using three different datasets (based on RT-qPCR and single cell RNA-seq data, respectively), we show that NMF outperforms PCA in identifying subpopulations in an accurate and robust way, without the need for prior feature selection; moreover, NMF successfully recovered the broad classes on a large dataset (thousands of single-cell transcriptomes), as identified by a computationally sophisticated method. NMF allows to identify feature genes in a direct, unbiased manner. We propose novel approaches for determining a biologically meaningful number of subpopulations based on minimizing the ambiguity of classification. In conclusion, our study shows that NMF is a robust, informative and simple method for the unsupervised learning of cell subtypes from single-cell gene expression data.
Availability and implementation: https://github.com/ccshao/nimfa CONTACTS: c.shao@Dkfz-Heidelberg.de or t.hoefer@Dkfz-Heidelberg.deSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293. Bioinformatics. 2018. PMID: 29949988 Free PMC article.
-
SCMarker: Ab initio marker selection for single cell transcriptome profiling.PLoS Comput Biol. 2019 Oct 28;15(10):e1007445. doi: 10.1371/journal.pcbi.1007445. eCollection 2019 Oct. PLoS Comput Biol. 2019. PMID: 31658262 Free PMC article.
-
Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data.Sci Rep. 2019 Dec 30;9(1):20353. doi: 10.1038/s41598-019-56911-z. Sci Rep. 2019. PMID: 31889137 Free PMC article.
-
Identifying cell populations with scRNASeq.Mol Aspects Med. 2018 Feb;59:114-122. doi: 10.1016/j.mam.2017.07.002. Epub 2017 Jul 25. Mol Aspects Med. 2018. PMID: 28712804 Review.
-
Nonnegative matrix factorization: an analytical and interpretive tool in computational biology.PLoS Comput Biol. 2008 Jul 25;4(7):e1000029. doi: 10.1371/journal.pcbi.1000029. PLoS Comput Biol. 2008. PMID: 18654623 Free PMC article. Review.
Cited by
-
Single Cell RNA Sequencing of Rare Immune Cell Populations.Front Immunol. 2018 Jul 4;9:1553. doi: 10.3389/fimmu.2018.01553. eCollection 2018. Front Immunol. 2018. PMID: 30022984 Free PMC article. Review.
-
K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24. Comput Biol Med. 2024. PMID: 38678944 Free PMC article.
-
SSRE: Cell Type Detection Based on Sparse Subspace Representation and Similarity Enhancement.Genomics Proteomics Bioinformatics. 2021 Apr;19(2):282-291. doi: 10.1016/j.gpb.2020.09.004. Epub 2021 Feb 27. Genomics Proteomics Bioinformatics. 2021. PMID: 33647482 Free PMC article.
-
ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets.J Bioinform Comput Biol. 2020 Jun;18(3):2040009. doi: 10.1142/S0219720020400090. J Bioinform Comput Biol. 2020. PMID: 32698720 Free PMC article.
-
A new and effective two-step clustering approach for single cell RNA sequencing data.BMC Genomics. 2023 Nov 9;23(Suppl 6):864. doi: 10.1186/s12864-023-09577-x. BMC Genomics. 2023. PMID: 37946133 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
