Data integration in plant biology: the O2PLS method for combined modeling of transcript and metabolite data

Plant J. 2007 Dec;52(6):1181-91. doi: 10.1111/j.1365-313X.2007.03293.x. Epub 2007 Oct 10.


The technological advances in the instrumentation employed in life sciences have enabled the collection of a virtually unlimited quantity of data from multiple sources. By gathering data from several analytical platforms, with the aim of parallel monitoring of, e.g. transcriptomic, metabolomic or proteomic events, one hopes to answer and understand biological questions and observations. This 'systems biology' approach typically involves advanced statistics to facilitate the interpretation of the data. In the present study, we demonstrate that the O2PLS multivariate regression method can be used for combining 'omics' types of data. With this methodology, systematic variation that overlaps across analytical platforms can be separated from platform-specific systematic variation. A study of Populus tremula x Populus tremuloides, investigating short-day-induced effects at transcript and metabolite levels, is employed to demonstrate the benefits of the methodology. We show how the models can be validated and interpreted to identify biologically relevant events, and discuss the results in relation to a pairwise univariate correlation approach and principal component analysis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Genomics / methods
  • Models, Biological*
  • Multivariate Analysis
  • Plants / genetics*
  • Plants / metabolism*
  • Populus / genetics
  • Populus / metabolism
  • Proteomics / methods
  • Regression Analysis
  • Systems Biology / methods*