Fast, Exact Bootstrap Principal Component Analysis for p > 1 million
- PMID: 27616801
- PMCID: PMC5014451
- DOI: 10.1080/01621459.2015.1062383
Fast, Exact Bootstrap Principal Component Analysis for p > 1 million
Abstract
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same n-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.
Keywords: PCA; SVD; functional data analysis; image analysis; singular value decomposition.
Figures
Similar articles
-
Stability of nonlinear principal components analysis: an empirical study using the balanced bootstrap.Psychol Methods. 2007 Sep;12(3):359-79. doi: 10.1037/1082-989X.12.3.359. Psychol Methods. 2007. PMID: 17784799
-
Estimating the number of components and detecting outliers using Angle Distribution of Loading Subspaces (ADLS) in PCA analysis.Anal Chim Acta. 2018 Aug 22;1020:17-29. doi: 10.1016/j.aca.2018.03.044. Epub 2018 Mar 29. Anal Chim Acta. 2018. PMID: 29655425
-
Memory Efficient PCA Methods for Large Group ICA.Front Neurosci. 2016 Feb 2;10:17. doi: 10.3389/fnins.2016.00017. eCollection 2016. Front Neurosci. 2016. PMID: 26869874 Free PMC article.
-
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification.In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
-
In Vivo Observations of Rapid Scattered Light Changes Associated with Neurophysiological Activity.In: Frostig RD, editor. In Vivo Optical Imaging of Brain Function. 2nd edition. Boca Raton (FL): CRC Press/Taylor & Francis; 2009. Chapter 5. In: Frostig RD, editor. In Vivo Optical Imaging of Brain Function. 2nd edition. Boca Raton (FL): CRC Press/Taylor & Francis; 2009. Chapter 5. PMID: 26844322 Free Books & Documents. Review.
Cited by
-
Evaluating a Method to Estimate Mediation Effects With Discrete-Time Survival Outcomes.Front Psychol. 2019 Apr 5;10:740. doi: 10.3389/fpsyg.2019.00740. eCollection 2019. Front Psychol. 2019. PMID: 31024391 Free PMC article.
-
A principal component analysis-based framework for statistical modeling of bone displacement during wrist maneuvers.J Biomech. 2019 Mar 6;85:173-181. doi: 10.1016/j.jbiomech.2019.01.030. Epub 2019 Jan 24. J Biomech. 2019. PMID: 30738587 Free PMC article.
-
MOSS: multi-omic integration with sparse value decomposition.Bioinformatics. 2022 May 13;38(10):2956-2958. doi: 10.1093/bioinformatics/btac179. Bioinformatics. 2022. PMID: 35561193 Free PMC article.
-
Discrepancies in metabolomic biomarker identification from patient-derived lung cancer revealed by combined variation in data pre-treatment and imputation methods.Metabolomics. 2021 Mar 27;17(4):37. doi: 10.1007/s11306-021-01787-2. Metabolomics. 2021. PMID: 33772663 Free PMC article.
-
Evidence against tetrapod-wide digit identities and for a limited frame shift in bird wings.Nat Commun. 2019 Jul 19;10(1):3244. doi: 10.1038/s41467-019-11215-8. Nat Commun. 2019. PMID: 31324809 Free PMC article.
References
-
- Ashburner J, Friston KJ. Voxel-based morphometry—the methods. Neuroimage. 2000;11(6):805–821. 2.2. - PubMed
-
- Babamoradi H, van den Berg F, Rinnan A. Bootstrap based confidence limits in principal component analysis–a case study. Chemometrics and Intelligent Laboratory Systems. 2012 3.1, 3.4.
-
- Bell AJ, Sejnowski TJ. An information-maximization approach to blind separation and blind deconvolution. Neural computation. 1995;7(6):1129–1159. 1. - PubMed
-
- Beran R, Srivastava MS. Bootstrap tests and confidence regions for functions of a covariance matrix. The Annals of Statistics. 1985:95–115. 3.3.2, 3.3.3.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials