Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 13;12(1):5982.
doi: 10.1038/s41467-021-25939-z.

Primary visual cortex straightens natural video trajectories

Affiliations
Free PMC article

Primary visual cortex straightens natural video trajectories

Olivier J Hénaff et al. Nat Commun. .
Free PMC article

Abstract

Many sensory-driven behaviors rely on predictions about future states of the environment. Visual input typically evolves along complex temporal trajectories that are difficult to extrapolate. We test the hypothesis that spatial processing mechanisms in the early visual system facilitate prediction by constructing neural representations that follow straighter temporal trajectories. We recorded V1 population activity in anesthetized macaques while presenting static frames taken from brief video clips, and developed a procedure to measure the curvature of the associated neural population trajectory. We found that V1 populations straighten naturally occurring image sequences, but entangle artificial sequences that contain unnatural temporal transformations. We show that these effects arise in part from computational mechanisms that underlie the stimulus selectivity of V1 cells. Together, our findings reveal that the early visual system uses a set of specialized computations to build representations that can support prediction in the natural environment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Measuring curvature in the pixel and neural domains.
a An example sequence of three movie frames. b Visualization of a high-dimensional representation of this sequence. Each point corresponds to a single movie frame, with each coordinate specifying the brightness of a pixel within that frame. Discrete curvature cpixel is defined as the unsigned angle between two segments connecting adjacent frames. c The curvature of a sequence is fully determined by the collection of pairwise distances. d Simulated responses of two neurons to a sequence of three movie frames. Error bars illustrate ± 1 standard deviation. e Joint response probabilities for these two neurons, for each of the three frames. f Representation of the same sequence of frames in a two-dimensional neural distance space. The distance metric is frame discriminability (in units of d'—Euclidian distance divided by standard deviation). g Comparison of neural curvature estimates obtained under a response model that includes correlations (abscissa) to one that assumes independence (ordinate).
Fig. 2
Fig. 2. Testing the temporal straightening hypothesis.
a We presented individual frames from different video clips in randomized order. Each frame was shown for 200 ms, followed by 100 ms of constant luminance. We used multi-electrode arrays to record V1 population activity. The raster illustrates spiking activity recorded over a 1.3 sec period from an example V1 population (“Population 7“, consisting of 39 units). b Mean response for three units in the example population, to frames of “Movie 7“. We fit a descriptive response model to each unit. Points indicate mean spike counts, lines illustrate predicted responses of the fitted model. c We evaluated the model’s goodness of fit by computing the correlation between predicted and measured response mean, variance, and covariance across the units and frames in each dataset. Each point corresponds to a dataset (red points indicate the example dataset). Blue lines indicate the inclusion criteria independently applied to these three statistics (included datasets are black points, excluded datasets are transparent green points). d Two-dimensional projections of the trajectory for the example dataset, in the pixel domain (left), and in the neural domain (right). e Estimated neural curvature of the example dataset (white triangle) and its expected distribution under the null model (gray histogram, see “Methods”). The black and gray triangle indicate the pixel-domain value and the null model’s mean, respectively. Relative curvature is −69, *P < 0.05, non-parametric test.
Fig. 3
Fig. 3. Curvature reduction for natural movie sequences.
a Relative curvature for seven V1 populations probed with twenty natural movies. Blue colors indicate neural straightening, red colors indicate neural entangling, and gray indicates missing data (see “Methods”). b Relative curvature for each V1 population, averaged across all movies. Error bars indicate s.e.m. across movies, n values indicate population size. c Relative curvature for twenty movies, averaged across all populations. Error bars indicate s.e.m. across populations. Insets illustrate a single frame from three movies that elicited no, medium, and strong straightening (left, middle, and right, respectively; see “smile”, “walking” and “prairie” in “Methods”).
Fig. 4
Fig. 4. Testing the specificity of temporal straightening for natural sequences.
a An example sequence illustrating the first, middle, and last frame of an “unnatural” movie, constrained to be straight in the pixel domain. b Mean responses of three example units (Population 1) to frames of Movie 1. Points indicates mean spike counts, lines illustrate predicted responses of the descriptive response model. Prediction line for Neuron 2 is shown in red for clarity. c The correlation between predicted and measured response mean, variance, and covariance across all units and frames in each dataset (red points correspond to the example dataset). Blue lines indicate the inclusion criteria independently applied to these three statistics (included datasets are black points, excluded datasets are transparent green points). d Two-dimensional projections of trajectory of the example dataset, in the pixel domain (left), and in the neural domain (right). e Neural curvature of the example dataset (white triangle) and its expected distribution under the null model (gray histogram). The black and gray triangle indicate the pixel-domain value and the null model’s mean, respectively. Relative curvature is 82. *P < 0.05, non-parametric test.
Fig. 5
Fig. 5. Curvature increase for unnatural sequences.
a Relative curvature for seven V1 populations probed with twenty unnatural movies. The ordering of the unnatural movies is matched to the ordering of the corresponding natural movies (Fig. 3). b Relative curvature for each V1 population, averaged across all movies. Error bars indicate s.e.m. across movies, n is population size. c Relative curvature for twenty movies, averaged across all populations. Error bars indicate s.e.m. across populations. Insets illustrate a single frame from three movies that elicited strong, mild, and no entangling (left, middle, and right, respectively; see “bees”, “walking”, and “carnegie dam” in “Methods”).
Fig. 6
Fig. 6. Comparing different spatial and temporal scales.
a Relative curvature for twenty movies, displayed at two spatial scales, averaged across all V1 populations. Black points indicate natural movies, white points unnatural ones (temporal scale is framerate × 1). b Relative curvature for forty movies, calculated for two temporal scales, averaged across all V1 populations. Black points indicate natural movies, white points unnatural ones (spatial scale is zoom × 1). c Relative curvature as a function of average firing rate for all natural movies. Each point illustrates a dataset, zoom × 1 is shown in red, zoom × 2 is shown in gray, temporal scale is framerate × 1. *P = 0.01, two-sided Wilcoxon signed-rank test, n = 61 datasets.
Fig. 7
Fig. 7. Relating V1 computation to temporal straightening.
a LN–LN model schematic. Stimuli are processed by a set of four linear Gaussian-derivative filters with phases differing by factors of 90, followed by response exponentiation, linear pooling, and divisive normalization. b Measured and predicted tuning of an example neuron for direction of motion, spatial frequency, temporal modulation, and a natural movie ('Carnegie Dam', at zoom × 1). c Goodness-of-fit statistics across all individual V1 neurons. The model was “trained” on a set of white noise stimuli and a collection of drifting gratings. It was “tested“ on the natural and unnatural movie frames. Box bounds represent the interquartile range, whiskers represent the range between the 1st and 99th percentiles. d Two-dimensional projections of an unnatural video’s trajectory in the pixel domain (left), the model domain (middle) and in the neural domain (right). e Comparison of model-predicted and data-estimated relative curvature for forty movies, averaged across all V1 populations. Black points indicate natural movies, white points unnatural ones.
Fig. 8
Fig. 8. Dissection of the elements of the stimulus response model that shape neural trajectories.
The correlation between model-predicted and data-estimated curvature for different sub-models. From left to right: the full fitted model, model without divisive normalization, model with only complex cells, and model with only simple cells. Black bars indicate natural movies, white bars unnatural ones, the dotted line indicates a significance criterion of P = 0.05 (two-sided test).

Similar articles

Cited by

References

    1. Földiák P. Learning invariance from transformation sequences. Neural Comput. 1991;3:194–200. doi: 10.1162/neco.1991.3.2.194. - DOI - PubMed
    1. Tishby, N., Pereira, F. C. & Bialek, W. The information bottleneck method. In Proc. 37th Annual Allerton Conference on Communication, Control and Computing (University of Illinois, Urbana, IL), Vol 37, 368–377, pages 1–16 (1999).
    1. Li N, DiCarlo JJ. Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron. 2010;67:1062–1075. doi: 10.1016/j.neuron.2010.08.029. - DOI - PMC - PubMed
    1. Goroshin, R., Mathieu, M. & LeCun, Y. Learning to Linearize Under Uncertainty (NIPS, 2015).
    1. Palmer, S. E., Marre, O., Berry, M. J. & Bialek, W. Predictive information in a sensory population. Proc. Natl Acad. Sci. USA112, 6908–6913 (2015). - PMC - PubMed

Publication types

LinkOut - more resources