Principal component analysis of binary genomics data
- PMID: 30657888
- DOI: 10.1093/bib/bbx119
Principal component analysis of binary genomics data
Abstract
Motivation: Genome-wide measurements of genetic and epigenetic alterations are generating more and more high-dimensional binary data. The special mathematical characteristics of binary data make the direct use of the classical principal component analysis (PCA) model to explore low-dimensional structures less obvious. Although there are several PCA alternatives for binary data in the psychometric, data analysis and machine learning literature, they are not well known to the bioinformatics community. Results: In this article, we introduce the motivation and rationale of some parametric and nonparametric versions of PCA specifically geared for binary data. Using both realistic simulations of binary data as well as mutation, CNA and methylation data of the Genomic Determinants of Sensitivity in Cancer 1000 (GDSC1000), the methods were explored for their performance with respect to finding the correct number of components, overfit, finding back the correct low-dimensional structure, variable importance, etc. The results show that if a low-dimensional structure exists in the data, that most of the methods can find it. When assuming a probabilistic generating process is underlying the data, we recommend to use the parametric logistic PCA model, while when such an assumption is not valid and the data are considered as given, the nonparametric Gifi model is recommended.
Availability: The codes to reproduce the results in this article are available at the homepage of the Biosystems Data Analysis group (www.bdagroup.nl).
Similar articles
-
Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration.Stat Appl Genet Mol Biol. 2017 Jul 26;16(3):199-216. doi: 10.1515/sagmb-2016-0066. Stat Appl Genet Mol Biol. 2017. PMID: 28727569
-
Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA.Brief Bioinform. 2015 Mar;16(2):291-303. doi: 10.1093/bib/bbu003. Epub 2014 Mar 13. Brief Bioinform. 2015. PMID: 24632304 Free PMC article.
-
Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.Bioinformatics. 2015 Aug 15;31(16):2683-90. doi: 10.1093/bioinformatics/btv197. Epub 2015 Apr 10. Bioinformatics. 2015. PMID: 25861969 Free PMC article.
-
Decision tree and ensemble learning algorithms with their applications in bioinformatics.Adv Exp Med Biol. 2011;696:191-9. doi: 10.1007/978-1-4419-7046-6_19. Adv Exp Med Biol. 2011. PMID: 21431559 Review.
-
Variable Selection for Time-to-Event Data.Methods Mol Biol. 2021;2194:61-76. doi: 10.1007/978-1-0716-0849-4_5. Methods Mol Biol. 2021. PMID: 32926362 Review.
Cited by
-
Identification and Preliminary Clinical Validation of Key Extracellular Proteins as the Potential Biomarkers in Hashimoto's Thyroiditis by Comprehensive Analysis.Biomedicines. 2023 Nov 24;11(12):3127. doi: 10.3390/biomedicines11123127. Biomedicines. 2023. PMID: 38137348 Free PMC article.
-
The Role of NCS1 in Immunotherapy and Prognosis of Human Cancer.Biomedicines. 2023 Oct 12;11(10):2765. doi: 10.3390/biomedicines11102765. Biomedicines. 2023. PMID: 37893139 Free PMC article.
-
Roxadustat alleviates the inflammatory status in patients receiving maintenance hemodialysis with erythropoiesis-stimulating agent resistance by increasing the short-chain fatty acids producing gut bacteria.Eur J Med Res. 2023 Jul 10;28(1):230. doi: 10.1186/s40001-023-01179-3. Eur J Med Res. 2023. PMID: 37430374 Free PMC article.
-
MMP1 acts as a potential regulator of tumor progression and dedifferentiation in papillary thyroid cancer.Front Oncol. 2022 Nov 21;12:1030590. doi: 10.3389/fonc.2022.1030590. eCollection 2022. Front Oncol. 2022. PMID: 36479070 Free PMC article.
-
Predictive Biomarkers for Postmyocardial Infarction Heart Failure Using Machine Learning: A Secondary Analysis of a Cohort Study.Evid Based Complement Alternat Med. 2021 Dec 13;2021:2903543. doi: 10.1155/2021/2903543. eCollection 2021. Evid Based Complement Alternat Med. 2021. PMID: 34938340 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
