Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach

Hum Genet. 2003 May;112(5-6):581-92. doi: 10.1007/s00439-003-0921-9. Epub 2003 Feb 27.


Partial least squares discriminant analysis (PLS-DA) is a partial least squares regression of a set Y of binary variables describing the categories of a categorical variable on a set X of predictor variables. It is a compromise between the usual discriminant analysis and a discriminant analysis on the significant principal components of the predictor variables. This technique is specially suited to deal with a much larger number of predictors than observations and with multicollineality, two of the main problems encountered when analysing microarray expression data. We explore the performance of PLS-DA with published data from breast cancer (Perou et al. 2000). Several such analyses were carried out: (1) before vs after chemotherapy treatment, (2) estrogen receptor positive vs negative tumours, and (3) tumour classification. We found that the performance of PLS-DA was extremely satisfactory in all cases and that the discriminant cDNA clones often had a sound biological interpretation. We conclude that PLS-DA is a powerful yet simple tool for analysing microarray data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Data Interpretation, Statistical*
  • Drug Therapy
  • Humans
  • Least-Squares Analysis*
  • Neoplasms / classification
  • Neoplasms / metabolism
  • Oligonucleotide Array Sequence Analysis*
  • Receptors, Estrogen / metabolism


  • Receptors, Estrogen