Relevance of a feed-forward model of visual attention for goal-oriented and free-viewing tasks

IEEE Trans Image Process. 2010 Nov;19(11):2801-13. doi: 10.1109/TIP.2010.2052262. Epub 2010 Jun 7.

Abstract

A purely bottom-up model of visual attention is proposed and compared to five state-of-the-art models. The role of the low-level visual features is examined in two contexts. Two datasets are used: one containing data coming from an eye tracking experiment obtained in a free-viewing task and a second containing 5000 hand-label pictures (observers had to enclose the most visually interesting objects in a rectangle). The relevancy of the bottom-up models, i.e. the ability of a model to predict where the salient areas are located, is evaluated. Whatever the metrics and the datasets, the degree of similarity between predictions and ground truth is significantly above chance. The proposed model, resting on a small number of features, is shown to be a good predictor of the human visual fixations but also a good predictor of the objects chosen as interesting by observers. This study suggests that the low-level of visual features have a significant role in a free-viewing task but also in a high-level visual task, such as the choice of the object of interest in a complex visual scene. Another outcome concerns the viewing duration used in eye tracking experiments. Results suggest that this parameter is finally not as critical as one would expect.

MeSH terms

  • Algorithms
  • Attention / physiology*
  • Fixation, Ocular / physiology*
  • Humans
  • Image Processing, Computer-Assisted
  • Models, Biological*
  • Pattern Recognition, Automated
  • Photic Stimulation