In this work, we investigated the suitability of performing partial least square regression (PLSR) on genotype-phenotype datasets to identify marker-trait associations. We utilized data collected on a cotton (Gossypium hirsutum L.) recombinant inbred line (RIL) mapping population that was evaluated under contrasting irrigation treatments, well-watered and water-limited conditions, in a hot, arid environment in 2012. Two phenotypic data sets were used in combination with the genetic data which consisted of 841 marker loci assigned to 117 linkage groups. The first dataset contained canopy traits that were gathered using a mobile, high-throughput phenotyping platform and included canopy temperature (CT), normalized difference vegetation index (NDVI), and canopy height (CHT) with leaf area index (LAI) being derived from NDVI and CHT measurements. The second phenotypic data set consisted of 14 elemental concentration measurements corresponding to the following elements: P, K, Ca, Mn, Fe, Zn, Ni, Cu, As, Co, Rb, Mo, S, and Mg. To conduct the PSLR analyses we used the "pls" and "pls depot" available in R statistical software version 3.2.4. The PLSR bi plot from the analysis of the first dataset showed that three (LAI, NDVI, and CHT) out of the four canopy traits were highly correlated, and by using multivariate analysis of variance (MANOVA), we detected 22 significant (p<0.01) marker-trait associations for the four traits. In contrast to the canopy trait analysis, our PLSR bi plot for the second dataset showed varying correlations for each of the 14 traits. Because of the lack of distinct trait similarities, MANOVA was not an ideal option to test for marker-trait associations so we implemented a jackknife re sampling technique. Jackknife re sampling failed to detect significant marker effects for several of the 14 elemental concentration traits. Thus, our future work aims to test other re sampling techniques such as boot straping for traits that do not exhibit high correlation. Overall, PLSR was a very informative way to comprehend data structure, displaying correlations within markers, within traits, and between marker and traits in one bi plot. Further studies are still needed to leverage detection of additional variance in correlated datasets and to prevent spurious results. To the best of our knowledge, this is the first time PLSR has been reported in such a context.
Keywords: Marker-trait association; Multivariate analyses; PLSR; Plant methods.