Feature selection in feature network models: finding predictive subsets of features with the Positive Lasso

Br J Math Stat Psychol. 2008 May;61(Pt 1):1-27. doi: 10.1348/000711006X119365.

Abstract

A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.

MeSH terms

  • Algorithms*
  • Computer Graphics*
  • Computer Simulation
  • Humans
  • Least-Squares Analysis*
  • Linear Models
  • Neural Networks, Computer*
  • Phonation
  • Phonetics
  • Speech Acoustics
  • Speech Perception