Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016;111(514):846-860.
doi: 10.1080/01621459.2015.1062383. Epub 2016 Aug 18.

Fast, Exact Bootstrap Principal Component Analysis for p > 1 million

Fast, Exact Bootstrap Principal Component Analysis for p > 1 million

Aaron Fisher et al. J Am Stat Assoc. 2016.

Abstract

Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same n-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods.

Keywords: PCA; SVD; functional data analysis; image analysis; singular value decomposition.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Summary of EEG dataset - The left panel shows examples of normalized δ power (NPδ) over the course of the night for five subjects, as well as the mean NPδ function across all subjects (μ). The right panel shows the first five PCs of the dataset.
Figure 2
Figure 2
Coverage across simulation scenarios - The (3 × 2) array of plots on the left shows the median coverage rate across all p estimated CIs for the PC elements (p = 900). Rows correspond to the PC being estimated. Simulation cases using the empirical eigenvalue spacing are shown on the left column, and simulation cases where where each PC explains half as much as the previous PC are shown on the right column. The (3 × 2) array of plots on the right shows coverage for CRs for the PCs.
Figure 3
Figure 3
Bootstrap PC variability - Each column of plots corresponds to a different PC, either the first, second or third. The top row shows the fitted principal components on the original high dimensional space (V[,k] for k = 1, 2, 3), along with pointwise confidence intervals, and 30 draws from the bootstrap distribution. The bottom row shows the same information, but for the low dimensional representation of the bootstrap PCs ( A[,k]b for k = 1, 2, 3). In the bottom row, the thick black line corresponds to the case when A[,k]b=In[,k], where In[,k] is the kth column of the n × n identity matrix, such that V[,k]b=VA[,k]b=V[,k].
Figure 4
Figure 4
Bootstrap eigenvalue distribution - For both the EEG and MRI datasets, we show bootstrap distribution for the first three eigenvalues of the sample covariance matrix. Tick marks show the eigenvalues from the original sample covariance matrix.
Figure 5
Figure 5
Fitted sample values, bootstrap standard errors, and Z-scores for the MRI PCs - The voxelwise values for the PCs and Z-scores (top and bottom rows) have been binned, and shaded according to the value of their corresponding bin's midpoint. This allows us to visually show both sign (color) and magnitude (opacity). Because the standard errors (middle row) are always positive, the binning procedure is not necessary, and the voxels are shaded on a continuous scale.
Figure 6
Figure 6
Low dimensional CIs for the MRI PCs - Moment-based CIs, percentile CIs, and 30 random bootstrap draws for A[1:15,k]b, where k = 1, 2 and 3.

Similar articles

Cited by

References

    1. Ashburner J, Friston KJ. Voxel-based morphometry—the methods. Neuroimage. 2000;11(6):805–821. 2.2. - PubMed
    1. Babamoradi H, van den Berg F, Rinnan A. Bootstrap based confidence limits in principal component analysis–a case study. Chemometrics and Intelligent Laboratory Systems. 2012 3.1, 3.4.
    1. Bell AJ, Sejnowski TJ. An information-maximization approach to blind separation and blind deconvolution. Neural computation. 1995;7(6):1129–1159. 1. - PubMed
    1. Beran R, Srivastava MS. Bootstrap tests and confidence regions for functions of a covariance matrix. The Annals of Statistics. 1985:95–115. 3.3.2, 3.3.3.
    1. Bobb JF, Schwartz BS, Davatzikos C, Caffo B. Cross-sectional and longitudinal association of body mass index and brain volume. Human brain mapping. 2014;35(1):75–88. 2.2. - PMC - PubMed

Publication types

LinkOut - more resources