Many advanced metabolomics experiments currently lead to data where a large number of response variables were measured while one or several factors were changed. Often the number of response variables vastly exceeds the sample size and well-established techniques such as multivariate analysis of variance (MANOVA) cannot be used to analyze the data. ANOVA simultaneous component analysis (ASCA) is an alternative to MANOVA for analysis of metabolomics data from an experimental design. In this paper, we show that ASCA assumes that none of the metabolites are correlated and that they all have the same variance. Because of these assumptions, ASCA may relate the wrong variables to a factor. This reduces the power of the method and hampers interpretation. We propose an improved model that is essentially a weighted average of the ASCA and MANOVA models. The optimal weight is determined in a data-driven fashion. Compared to ASCA, this method assumes that variables can correlate, leading to a more realistic view of the data. Compared to MANOVA, the model is also applicable when the number of samples is (much) smaller than the number of variables. These advantages are demonstrated by means of simulated and real data examples. The source code of the method is available from the first author upon request, and at the following github repository: https://github.com/JasperE/regularized-MANOVA.
Keywords: Analysis of variance – simultaneous component analysis; Experimental design; High dimensional data; Metabolomics; Multivariate analysis of variance; Regularized multivariate analysis of variance.
Copyright © 2015 Elsevier B.V. All rights reserved.