Factor selection and structural identification in the interaction ANOVA model

Biometrics. 2013 Mar;69(1):70-9. doi: 10.1111/j.1541-0420.2012.01810.x. Epub 2013 Jan 17.


When faced with categorical predictors and a continuous response, the objective of an analysis often consists of two tasks: finding which factors are important and determining which levels of the factors differ significantly from one another. Often times, these tasks are done separately using Analysis of Variance (ANOVA) followed by a post hoc hypothesis testing procedure such as Tukey's Honestly Significant Difference test. When interactions between factors are included in the model the collapsing of levels of a factor becomes a more difficult problem. When testing for differences between two levels of a factor, claiming no difference would refer not only to equality of main effects, but also to equality of each interaction involving those levels. This structure between the main effects and interactions in a model is similar to the idea of heredity used in regression models. This article introduces a new method for accomplishing both of the common analysis tasks simultaneously in an interaction model while also adhering to the heredity-type constraint on the model. An appropriate penalization is constructed that encourages levels of factors to collapse and entire factors to be set to zero. It is shown that the procedure has the oracle property implying that asymptotically it performs as well as if the exact structure were known beforehand. We also discuss the application to estimating interactions in the unreplicated case. Simulation studies show the procedure outperforms post hoc hypothesis testing procedures as well as similar methods that do not include a structural constraint. The method is also illustrated using a real data example.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Age Factors
  • Analysis of Variance*
  • Computer Simulation
  • Humans
  • Memory
  • Models, Statistical*