Analysis of genetic marker-phenotype relationships by jack-knifed partial least squares regression (PLSR)

Hereditas. 2004;141(2):149-65. doi: 10.1111/j.1601-5223.2004.01816.x.

Abstract

The utility of a relatively new multivariate method, bi-linear modelling by cross-validated partial least squares regression (PLSR), was investigated in the analysis of QTL. The distinguishing feature of PLSR is to reveal reliable covariance structures in data of different types with regard to the same set objects. Two matrices X (here: genetic markers) and Y (here: phenotypes) are interactively decomposed into latent variables (PLS components, or PCs) in a way which facilitates statistically reliable and graphically interpretable model building. Natural collinearities between input variables are utilized actively to stabilise the modelling, instead of being treated as a statistical problem. The importance of cross-validation/jack-knifing as an intuitively appealing way to avoid overfitting, is emphasized. Two datasets from chromosomal mapping studies of different complexity were chosen for illustration (QTL for tomato yield and for oat heading date). Results from PLSR analysis were compared to published results and to results using the package PLABQTL in these data sets. In all cases PLSR gave at least similar explained validation variances as the reported studies. An attractive feature is that PLSR allows the analysis of several traits/replicates in one analysis, and the direct visual identification of individuals with desirable marker genotypes. It is suggested that PLSR may be useful in structural and functional genomics and in marker assisted selection, particularly in cases with limited number of objects.

MeSH terms

  • Crosses, Genetic
  • Data Interpretation, Statistical
  • Genetic Markers*
  • Least-Squares Analysis
  • Models, Genetic
  • Phenotype*
  • Quantitative Trait Loci
  • Regression Analysis
  • Solanum lycopersicum / genetics

Substances

  • Genetic Markers