A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle

J Dairy Sci. 2012 Apr;95(4):2120-31. doi: 10.3168/jds.2011-4647.


Genomic selection involves computing a prediction equation from the estimated effects of a large number of DNA markers based on a limited number of genotyped animals with phenotypes. The number of observations is much smaller than the number of independent variables, and the challenge is to find methods that perform well in this context. Partial least squares regression (PLS) and sparse PLS were used with a reference population of 3,940 genotyped and phenotyped French Holstein bulls and 39,738 polymorphic single nucleotide polymorphism markers. Partial least squares regression reduces the number of variables by projecting independent variables onto latent structures. Sparse PLS combines variable selection and modeling in a one-step procedure. Correlations between observed phenotypes and phenotypes predicted by PLS and sparse PLS were similar, but sparse PLS highlighted some genome regions more clearly. Both PLS and sparse PLS were more accurate than pedigree-based BLUP and generally provided lower correlations between observed and predicted phenotypes than did genomic BLUP. Furthermore, PLS and sparse PLS required similar computing time to genomic BLUP for the study of 6 traits.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Breeding
  • Cattle / genetics*
  • Dairying
  • Female
  • Fertilization / genetics
  • France
  • Genotype
  • Lactation / genetics
  • Least-Squares Analysis*
  • Male
  • Milk / chemistry
  • Pedigree
  • Phenotype
  • Pregnancy
  • Quantitative Trait, Heritable
  • Regression Analysis*
  • Reproducibility of Results
  • Selection, Genetic*