2011 Oct 11
Reconstructing Visual Experiences From Brain Activity Evoked by Natural Movies
Item in Clipboard
Reconstructing Visual Experiences From Brain Activity Evoked by Natural Movies
Quantitative modeling of human brain activity can provide crucial insights about cortical representations [1, 2] and can form the basis for brain decoding devices [3-5]. Recent functional magnetic resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns and have reconstructed these patterns from brain activity [6-8]. However, blood oxygen level-dependent (BOLD) signals measured via fMRI are very slow , so it has been difficult to model brain activity elicited by dynamic stimuli such as natural movies. Here we present a new motion-energy [10, 11] encoding model that largely overcomes this limitation. The model describes fast visual information and slow hemodynamics by separate components. We recorded BOLD signals in occipitotemporal visual cortex of human subjects who watched natural movies and fit the model separately to individual voxels. Visualization of the fit models reveals how early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder  by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
Copyright © 2011 Elsevier Ltd. All rights reserved.
Conflict of interest statement
The authors declare no conflict of interest.
Figure 1. Schematic diagram of the motion-energy encoding model
A, Stimuli first pass through a fixed set of nonlinear spatio-temporal motion-energy filters (shown in detail in panel B), and then through a set of hemodynamic response filters fit separately to each voxel. The summed output of the filter bank provides a prediction of BOLD signals. B, The nonlinear motion-energy filter bank consists of several filtering stages. Stimuli are first transformed into the Commission internationale de l'éclairage (CIE) L*A*B* color space and the color channels are stripped off. Luminance signals then pass through a bank of 6,555 spatio-temporal Gabor filters differing in position, orientation, direction, spatial and temporal frequency (see Supplemental Information for details). Motion energy is calculated by squaring and summing Gabor filters in quadrature. Finally, signals pass through a compressive nonlinearity and are temporally down-sampled to the fMRI sampling rate (1 Hz).
Figure 2. The directional motion-energy model capture motion information
A, (top) The static encoding model includes only Gabor filters that are not sensitive to motion. (bottom) Prediction accuracy of the static model is shown on a flattened map of the cortical surface of one subject (S1). Prediction accuracy is relatively poor. B, The non-directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies, but motion in opponent directions is pooled. Prediction accuracy of this model is better than the static model. C, The directional motion-energy encoding model includes Gabor filters tuned to a range of temporal frequencies and directions. This model provides the most accurate predictions of all models tested. D and E, Voxel-wise comparisons of prediction accuracy between the three models. The directional motion-energy model performs significantly better than the other two models, although the difference between the non-directional and directional motion models is small. See also Figure S1 for subject- and area-wise comparisons. F, The spatial receptive field of one voxel (left), and its spatial and temporal frequency selectivity (right). This receptive field is located near the fovea, and it is high-pass for spatial frequency and low-pass for temporal frequency. This voxel thus prefers static or slow speed motion. G, Receptive field for a second voxel. This receptive field is located lower periphery, and it is band-pass for spatial frequency and high-pass for temporal frequency. This voxel thus prefers higher speed motion than the voxel in F. H, Comparison of retinotopic angle maps estimated using (top) the motion-energy encoding model and (bottom) conventional multi-focal mapping on a flattened cortical map . The angle maps are similar, even though they were estimated using independent data sets and methods. I, Comparison of eccentricity maps estimated as in panel H. The maps are similar except in the far periphery where the multi-focal mapping stimulus was coarse. J, Optimal speed projected on to a flattened map as in panel H. Voxels near the fovea tend to prefer slow speed motion, while those in the periphery tend to prefer high speed motion. See also Figure S1B for subject-wise comparisons.
Figure 3. Identification analysis
A, Identification accuracy for one subject (S1). The test data in our experiment consisted of 486 volumes (seconds) of BOLD signals evoked by the test movies. The estimated model yielded 486 volumes of BOLD signals predicted for the same movies. The brightness of the point in the m th column and n th row represents the log-likelihood (see Supplemental Information) of the BOLD signals evoked at the m th second given the BOLD signal predicted at the n th second. The highest log-likelihood in each column is designated by a red circle and thus indicates the choice of the identification algorithm. B, Temporal offset between the correct timing and the timing identified by the algorithm, for the same subject shown in panel A. The algorithm was correct to within ± one volume (second) 95% of the time (464/486); chance performance is less than 1% (3/486; i.e., three volumes centered at the correct timing). C, Scaling of identification accuracy with set size. To understand how identification accuracy scales with size of stimulus set we enlarged the identification stimulus set to include additional stimuli drawn from a natural movie database (but not actually used in the experiment). For all three subjects identification accuracy (within ± one volume) is greater than 75% even when set of potential movies includes one million clips. This is far above chance (gray dashed line).
Figure 4. Reconstructions of natural movies from BOLD signals
A, First row: Three frames from a natural movie used in the experiment, taken one second apart. Second through sixth rows: frames from the five clips with the highest posterior probability. The maximum a posteriori (MAP) reconstruction is shown in row two. Seventh row: The averaged high posterior (AHP) reconstruction. The MAP provides good reconstruction of the second and third frames, while AHP provide more robust reconstructions across frames. B and C, Additional examples of reconstructions, format same as in panel A. D, Reconstruction accuracy (correlation in motion-energy; see Supplemental Information) for all three subjects. Error bars indicate ± 1 s.e.m. across one-second clips. Both the MAP and AHP reconstructions are significant, though the AHP reconstructions are significantly better than the MAP reconstructions. Dashed lines show chance performance ( P=0.01). See also Figure S2.
Bayesian Reconstruction of Natural Images From Human Brain Activity
T Naselaris et al.
Neuron 63 (6), 902-15.
Recent studies have used fMRI signals from early visual areas to reconstruct simple geometric patterns. Here, we demonstrate a new Bayesian decoder that uses fMRI signals …
Integration of EEG Source Imaging and fMRI During Continuous Viewing of Natural Movies
K Whittingstall et al.
Magn Reson Imaging 28 (8), 1135-42.
Electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) are noninvasive neuroimaging tools which can be used to measure brain activity with excellen …
Modular Encoding and Decoding Models Derived From Bayesian Canonical Correlation Analysis
Y Fujiwara et al.
Neural Comput 25 (4), 979-1005.
Neural encoding and decoding provide perspectives for understanding neural representations of sensory inputs. Recent functional magnetic resonance imaging (fMRI) studies …
The Human Visual Cortex
K Grill-Spector et al.
Annu Rev Neurosci 27, 649-77.
The discovery and analysis of cortical visual areas is a major accomplishment of visual neuroscience. In the past decade the use of noninvasive functional imaging, partic …
Modeling Correlated Noise Is Necessary to Decode Uncertainty
RS van Bergen et al.
Neuroimage 180 (Pt A), 78-87.
Brain decoding algorithms form an important part of the arsenal of analysis tools available to neuroscientists, allowing for a more detailed study of the kind of informat …
PubMed Central articles
A Large-Scale Standardized Physiological Survey Reveals Functional Organization of the Mouse Visual Cortex
SEJ de Vries et al.
Nat Neurosci 23 (1), 138-151.
To understand how the brain processes sensory information to guide behavior, we must know how stimulus representations are transformed throughout the visual cortex. Here …
The Importance of Considering Model Choices When Interpreting Results in Computational Neuroimaging
TC Sprague et al.
eNeuro 6 (6).
Model-based analyses open exciting opportunities for understanding neural information processing. In a commentary published in
eNeuro, Gardner and Liu (2019) discu …
A Zero-Shot Learning Approach to the Development of Brain-Computer Interfaces for Image Retrieval
B McCartney et al.
PLoS One 14 (9), e0214342.
Brain decoding-the process of inferring a person's momentary cognitive state from their brain activity-has enormous potential in the field of human-computer interaction. …
The Representation of Semantic Information Across Human Cerebral Cortex During Listening Versus Reading Is Invariant to Stimulus Modality
F Deniz et al.
J Neurosci 39 (39), 7722-7736.
An integral part of human language is the capacity to extract meaning from spoken and written words, but the precise relationship between brain representations of informa …
Elucidating the Neural Representation and the Processing Dynamics of Face Ensembles
T Roberts et al.
J Neurosci 39 (39), 7737-7747.
Extensive behavioral work has documented the ability of the human visual system to extract summary representations from face ensembles (e.g., the average identity of a cr …
Research Support, N.I.H., Extramural
Brain Mapping / methods
Image Processing, Computer-Assisted / methods
Magnetic Resonance Imaging / methods
Neurophysiology / methods
Visual Cortex / anatomy & histology
Visual Cortex / blood supply
Visual Cortex / physiology
LinkOut - more resources
Full Text Sources Other Literature Sources Miscellaneous