Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 17:10:e12967.
doi: 10.7717/peerj.12967. eCollection 2022.

PCAtest: testing the statistical significance of Principal Component Analysis in R

Affiliations
Free PMC article

PCAtest: testing the statistical significance of Principal Component Analysis in R

Arley Camargo. PeerJ. .
Free PMC article

Abstract

Principal Component Analysis (PCA) is one of the most broadly used statistical methods for the ordination and dimensionality-reduction of multivariate datasets across many scientific disciplines. Trivial PCs can be estimated from data sets without any correlational structure among the original variables, and traditional criteria for selecting non-trivial PC axes are difficult to implement, partially subjective or based on ad hoc thresholds. PCAtest is an R package that implements permutation-based statistical tests to evaluate the overall significance of a PCA, the significance of each PC axis, and of contributions of each observed variable to the significant axes. Based on simulation and empirical results, I encourage R users to routinely apply PCAtest to test the significance of their PCA before proceeding with the direct interpretation of PC axes and/or the utilization of PC scores in subsequent evolutionary and ecological analyses.

Keywords: PCAtest; Permutation; Principal component analysis; R function; Statistical significance.

PubMed Disclaimer

Conflict of interest statement

The author declared that they have no competing interests.

Figures

Figure 1
Figure 1. Null distributions and empirical statistics derived from PCAtest analysis of simulated data consisting of five uncorrelated variables and 100 observations.
Figure 2
Figure 2. Null distributions and empirical statistics derived from PCAtest analysis of simulated data consisting of five correlated variables (r = 0.25) and 100 observations.
Lower plots show mean observed values (red dots), 95%-confidence interval (CI) based on 1,000 bootstrap replicates (red bars), mean values and 95%-CI based on 1,000 random permutations (gray dots and bars, respectively).
Figure 3
Figure 3. Null distributions and empirical statistics derived from PCAtest analysis of simulated data consisting of five correlated variables (r = 0.50) and 100 observations.
Lower plots show mean observed values (red dots), 95%-confidence interval (CI) based on 1,000 bootstrap replicates (red bars), mean values and 95%-CI based on 1,000 random permutations (gray dots and bars, respectively).
Figure 4
Figure 4. Null distributions and empirical statistics derived from PCAtest analysis of seven morphological variables measured in 29 ant species (data from Wong & Carmona (2021)).
Lower plots show mean observed values (red dots), 95%-confidence interval (CI) based on 1,000 bootstrap replicates (red bars), mean values and 95%-CI based on 1,000 random permutations (gray dots and bars, respectively).
Figure 5
Figure 5. Results of R-mode PCAtest analysis of microarray data from Ringnér (2008).
Null distributions and empirical statistics derived from PCAtest analysis of 8,534 genes screened in 105 samples (data from Ringnér, 2008). Lower plots show mean observed values (red dots), 95%-confidence interval (CI) based on 1,000 bootstrap replicates (red bars), mean values and 95%-CI based on 1,000 random permutations (gray dots and bars, respectively).
Figure 6
Figure 6. Results of Q-mode PCAtest analysis of microarray data from Ringnér (2008).
Null distributions and empirical statistics derived from PCAtest analysis of 8,534 genes (observations) screened in 105 samples (variables). The original data set of Ringnér (2008) was transposed to perform a Q-mode PCA analysis. Lower plots show mean observed values (red dots), 95%-confidence interval (CI) based on 1,000 bootstrap replicates (red bars), mean values and 95%-CI based on 1,000 random permutations (gray dots and bars, respectively).

Similar articles

Cited by

References

    1. Björklund M. Be careful with your principal components. Evolution. 2019;73(10):2151–2158. doi: 10.1111/evo.13835. - DOI - PubMed
    1. Choi J, Yang X. Asymptotic properties of correlation-based principal component analysis. Journal of Econometrics. in press doi: 10.1016/j.jeconom.2021.08.003. - DOI
    1. Dijksterhuis GB, Heiser WJ. The role of permutation tests in exploratory multivariate data analysis. Food Quality and Preference. 1995;6(4):263–270. doi: 10.1016/0950-3293(95)00025-9. - DOI
    1. Dobriban E. Permutation methods for factor analysis and PCA. The Annals of Statistics. 2020;48(5):2824–2847. doi: 10.1214/19-AOS1907. - DOI
    1. Efron B. Bootstrap methods: another look at the Jacknife. The Annals of Statistics. 1979;7(1):1–26. doi: 10.1214/aos/1176344552. - DOI

Publication types

MeSH terms

Grants and funding

This work was supported by the Programa de Desarrollo de las Ciencias Básicas (PEDECIBA, Uruguay) and the Sistema Nacional de Investigadores, Agencia Nacional de Investigación e Innovación (SNI-ANII, Uruguay). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources