Visual scene analysis in humans has been characterized by the presence of regions in extrastriate cortex that are selectively responsive to scenes compared with objects or faces. While these regions have often been interpreted as representing high-level properties of scenes (e.g. category), they also exhibit substantial sensitivity to low-level (e.g. spatial frequency) and mid-level (e.g. spatial layout) properties, and it is unclear how these disparate findings can be united in a single framework. In this opinion piece, we suggest that this problem can be resolved by questioning the utility of the classical low- to high-level framework of visual perception for scene processing, and discuss why low- and mid-level properties may be particularly diagnostic for the behavioural goals specific to scene perception as compared to object recognition. In particular, we highlight the contributions of low-level vision to scene representation by reviewing (i) retinotopic biases and receptive field properties of scene-selective regions and (ii) the temporal dynamics of scene perception that demonstrate overlap of low- and mid-level feature representations with those of scene category. We discuss the relevance of these findings for scene perception and suggest a more expansive framework for visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'.
Keywords: category-selectivity; electro-encephalography; functional magnetic resonance imaging; image statistics; natural scenes; retinotopy.
© 2017 The Author(s).