Learning intermediate-level representations of form and motion from natural movies

Charles F Cadieu; Bruno A Olshausen

doi:10.1162/NECO_a_00247

Learning intermediate-level representations of form and motion from natural movies

Neural Comput. 2012 Apr;24(4):827-66. doi: 10.1162/NECO_a_00247. Epub 2011 Dec 14.

Authors

Charles F Cadieu¹, Bruno A Olshausen

Affiliation

¹ Redwood Center for Theoretical Neuroscience, Helen Wills Neuroscience Institute, and School of Optometry, University of California, Berkeley, Berkeley, CA 94720, USA. cadieu@berkeley.edu

PMID: 22168556
DOI: 10.1162/NECO_a_00247

Abstract

We present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment. The model is composed of two stages of processing: an early feature representation layer and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure, consistent with known response properties in early visual cortex (area V1). This factorization linearizes statistical dependencies among the first-layer units, making them learnable by the second layer. The second-layer units are split into two populations according to the factorization in the first layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multiscale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images and testable hypotheses regarding intermediate-level representation in visual cortex.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Form Perception*
Humans
Learning / physiology*
Models, Neurological
Motion*
Neurons / physiology
Pattern Recognition, Visual / physiology
Photic Stimulation / methods
Visual Cortex / physiology*