PC-DOT: Improving genomic prediction ability of principal component regression by DOT product

Anim Genet. 2022 Dec;53(6):888-891. doi: 10.1111/age.13255. Epub 2022 Sep 27.

Abstract

Principal component regression (PC regression) is a useful method developed for prediction based on a dimension-reducing strategy. Generally, the principal components (PCs) are added to the regression model one by one based on the eigenvalue (PC-Eigen). Considering that some PCs with large eigenvalues may be poorly associated with the response variable, the PC-Eigen may not be the best framework. Researchers previously tried to add PCs to the model based on their contribution to the sum of squares of the regression (PC-SS) and they found that the performance of PC-SS is generally lower than that of the PC-Eigen. A standard approach for selecting the optimal set of PCs remains a challenge. Here, from the cosine similarity theory, we postulated that we could rank the PCs by dot product, and this framework (we called PC-DOT) could help to preferentially extract PCs that are highly correlated with the response variable and meanwhile have a large eigenvalue. Based on one simulated and three real genomic datasets (a total of 15 traits), we tested the prediction ability of different frameworks. In general, the PC-DOT method showed a better performance than both PC-Eigen and PC-SS. To facilitate the application of PC, we attached a series of R codes for different frameworks (https://github.com/SUNHAO-JLU/Genome_Prediction-PC_DOT). In addition, the HAT matrix was used to reduce the compute complex in reference data during the cross-validation process. Our work may help researchers to better understand and carry out the PC regression model.

Keywords: dot product; eigenvalue; principal component regression.

MeSH terms

  • Animals
  • Genome*
  • Genomics*
  • Phenotype
  • Principal Component Analysis