Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization

J Neurosci. 2020 Jul 1;40(27):5283-5299. doi: 10.1523/JNEUROSCI.2088-19.2020. Epub 2020 May 28.

Abstract

Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features.SIGNIFICANCE STATEMENT In a single fixation, we glean enough information to describe a general scene category. Many types of features are associated with scene categories, ranging from low-level properties, such as colors and contours, to high-level properties, such as objects and attributes. Because these properties are correlated, it is difficult to understand each property's unique contributions to scene categorization. This work uses a whitening transformation to remove the correlations between features and examines the extent to which each feature contributes to visual event-related potentials over time. We found that low-level visual features contributed first but were not correlated with categorization behavior. High-level features followed 80 ms later, providing key insights into how the brain makes sense of a complex visual world.

Keywords: EEG; decoding; encoding; natural scenes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adolescent
  • Brain / physiology
  • Color
  • Electroencephalography
  • Evoked Potentials, Visual / physiology
  • Female
  • Form Perception / physiology*
  • Humans
  • Male
  • Mental Processes / physiology
  • Noise
  • Photic Stimulation
  • Visual Perception / physiology*
  • Wavelet Analysis
  • Young Adult