Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 16:13:49.
doi: 10.3389/fnbot.2019.00049. eCollection 2019.

Autonomous Development of Active Binocular and Motion Vision Through Active Efficient Coding

Affiliations
Free PMC article

Autonomous Development of Active Binocular and Motion Vision Through Active Efficient Coding

Alexander Lelais et al. Front Neurorobot. .
Free PMC article

Abstract

We present a model for the autonomous and simultaneous learning of active binocular and motion vision. The model is based on the Active Efficient Coding (AEC) framework, a recent generalization of classic efficient coding theories to active perception. The model learns how to efficiently encode the incoming visual signals generated by an object moving in 3-D through sparse coding. Simultaneously, it learns how to produce eye movements that further improve the efficiency of the sensory coding. This learning is driven by an intrinsic motivation to maximize the system's coding efficiency. We test our approach on the humanoid robot iCub using simulations. The model demonstrates self-calibration of accurate object fixation and tracking of moving objects. Our results show that the model keeps improving until it hits physical constraints such as camera or motor resolution, or limits on its internal coding capacity. Furthermore, we show that the emerging sensory tuning properties are in line with results on disparity, motion, and motion-in-depth tuning in the visual cortex of mammals. The model suggests that vergence and tracking eye movements can be viewed as fundamentally having the same objective of maximizing the coding efficiency of the visual system and that they can be learned and calibrated jointly through AEC.

Keywords: active perception; autonomous learning; binocular vision; efficient coding; intrinsic motivation; optokinetic nystagmus; smooth pursuit.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the active vision architecture. From the binocular visual input at time points t − 1 and t, patches of different resolutions are extracted for the coarse pc (blue) and fine pf (red) scale. These patches are encoded by spatio-temporal basis functions of the coarse scale (blue) and fine scale (red) sparse coders. The activations of both sparse coders' basis functions ϕc and ϕf form the state vector st. The negative reconstruction error indicates the encoding efficiency and is used as the reward signal rt for the reinforcement learner. The Critic computes from rt and st a TD-error δt and three distinct actors generate from st movement actions αpan,t, αtilt,t, αvergence,t for the respective camera joints.
Figure 2
Figure 2
The agent operating the iCub robot inside the virtual environment rendered by the Gazebo simulator.
Figure 3
Figure 3
(A) Reconstruction error of the sparse coding model. The error is plotted in arbitrary units vs. training time for the coarse scale (blue) and fine scale (red) sparse coder in solid lines. The model's encoding performance in a control experiment, where the actions were uniformly sampled at random from the same action sets (RNDCTL) is plotted in dashed lines. (B) Coarse scale basis functions. Six representative spatio-temporal basis functions of a coarse scale dictionary are shown at the start (left) and the end of training (right). Every basis function consists of 4 parts. The rows show the corresponding patch for the left (top) and right (bottom) eye. The columns represent the patch for time t − 1 (left) and t (right).
Figure 4
Figure 4
Input image reconstruction. Depicted are column wise from left to right the original whitened camera image, the cropped, down-sampled, and normalized input image for the coarse scale (top row) and fine scale (bottom row) sparse coders. Right to the preprocessed images are the respective images reconstructed with random Gabor wavelets at initialization time and the images reconstructed with learned basis functions at the end of training.
Figure 5
Figure 5
(A) Testing performance vs. training iteration. Depicted are the respective errors in the pan Δv (yellow), tilt Δv (purple), and vergence Δξ (green) joint of the testing procedures for all test stimuli and movement speeds over 10 trials at the respective points in time during the training procedure. The lines represent the median errors and the shaded areas show one inter quartile range. (B) Testing performance at the end of training for agents with different sizes of sparse coding dictionaries over 3 experiment repetitions. Significant differences (p < 0.05) between two sets of data are assessed by a t-test and marked (*). Horizontal bars indicate effect size as measured by Cohen's d.
Figure 6
Figure 6
Learned policy distributions averaged over 10 agents and 50 stimuli. Depicted are action probabilities for the respective pan, tilt and vergence actor as a function of state errors.
Figure 7
Figure 7
Movement trajectories of an agent for one stimulus. For pan and tilt the respective joint speed was reset to 0 deg/s every 10 iterations as indicated by the red bars. For the vergence joint the fixation angle ξ was initialized with varying vergence errors every 10 iterations. The actual policy π is plotted, respectively, in yellow (pan), purple (tilt), and green (vergence) and the desired policy π* in black.
Figure 8
Figure 8
Testing performance at the end of training for agents with different configurations. Depicted are the respective errors in the pan Δv (yellow), tilt Δv (purple) and vergence Δξ (green) joint for all test stimuli and movement speeds. The configurations span the situations when there is no fine scale sparse coder (NFS), a coarser action set (CAS), and when the standard configuration is used (STD). Horizontal bars indicate comparisons between two sets of data as assessed by a t-test. Significant differences (p-values < 0.05) are marked (*) and effect sizes are indicated as measured by Cohen's d.
Figure 9
Figure 9
Basis functions' stimulus preferences for the coarse scale (blue) and fine scale (red) from a typical experiment. (A) Histogram of orientation preferences θ. (B) Histogram of disparity preferences d^. (C) Histogram of velocity preferences v^.
Figure 10
Figure 10
(A) Basis functions' disparity preference d^ at time t vs. t − 1 from a typical experiment. Each dot represents one basis function of the coarse scale (blue) or the fine scale (red). (B) Basis functions' velocity preference v^ at left eye vs. right eye. (C) Basis functions' velocity v^ vs. disparity d^ preference averaged over left and right eye and time t and t − 1, respectively. Here we show the basis functions' joint encoding of velocity and disparity. The basis functions are sensitive to a wide range of combinations of preferred velocities and preferred disparities. The velocity and disparity preferences of basis functions are not correlated.

Similar articles

Cited by

References

    1. Appelle S. (1972). Perception and discrimination as a function of stimulus orientation: the “oblique effect” in man and animals. Psychol. Bull. 78:266. - PubMed
    1. Attneave F. (1954). Some informational aspects of visual perception. Psychol. Rev. 61, 183–193. - PubMed
    1. Barlow H. B. (1961). Chapter 13: Possible principles underlying the transformations of sensory messages, in Sensory Communication, ed Rosenblith W. (Cambridge: MIT Press; ), 217–234.
    1. Bell A. J., Sejnowski T. J. (1997). The “independent components” of natural scenes are edge filters. Vision Res. 37, 3327–3338. - PMC - PubMed
    1. Beyeler M., Dutt N., Krichmar J. L. (2016). 3d visual response properties of mstd emerge from an efficient, sparse population code. J. Neurosci. 36, 8399–8415. 10.1523/JNEUROSCI.0396-16.2016 - DOI - PMC - PubMed

LinkOut - more resources