Principals about principal components in statistical genetics

Brief Bioinform. 2019 Nov 27;20(6):2200-2216. doi: 10.1093/bib/bby081.


Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.

Keywords: exploration and prediction; population stratification; principal component analysis; statistical genetics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Animals
  • Humans
  • Models, Statistical*
  • Principal Component Analysis*