Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 8 (7), e66990

Invariance of Visual Operations at the Level of Receptive Fields

Affiliations
Review

Invariance of Visual Operations at the Level of Receptive Fields

Tony Lindeberg. PLoS One.

Abstract

The brain is able to maintain a stable perception although the visual stimuli vary substantially on the retina due to geometric transformations and lighting variations in the environment. This paper presents a theory for achieving basic invariance properties already at the level of receptive fields. Specifically, the presented framework comprises (i) local scaling transformations caused by objects of different size and at different distances to the observer, (ii) locally linearized image deformations caused by variations in the viewing direction in relation to the object, (iii) locally linearized relative motions between the object and the observer and (iv) local multiplicative intensity transformations caused by illumination variations. The receptive field model can be derived by necessity from symmetry properties of the environment and leads to predictions about receptive field profiles in good agreement with receptive field profiles measured by cell recordings in mammalian vision. Indeed, the receptive field profiles in the retina, LGN and V1 are close to ideal to what is motivated by the idealized requirements. By complementing receptive field measurements with selection mechanisms over the parameters in the receptive field families, it is shown how true invariance of receptive field responses can be obtained under scaling transformations, affine transformations and Galilean transformations. Thereby, the framework provides a mathematically well-founded and biologically plausible model for how basic invariance properties can be achieved already at the level of receptive fields and support invariant recognition of objects and events under variations in viewpoint, retinal size, object motion and illumination. The theory can explain the different shapes of receptive field profiles found in biological vision, which are tuned to different sizes and orientations in the image domain as well as to different image velocities in space-time, from a requirement that the visual system should be invariant to the natural types of image transformations that occur in its environment.

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

Figure 1
Figure 1. The requirement of non-enhancement of local extrema is a way of restricting the class of possible image operations by formalizing the notion that new image structures must not be created with increasing scale, by requiring that the value at a local maximum must not increase and that the value at a local minimum must not decrease.
Figure 2
Figure 2. Spatial receptive fields formed by the 2-D Gaussian kernel with its partial derivatives up to order two.
The corresponding family of receptive fields is closed under translations, rotations and scaling transformations.
Figure 3
Figure 3. Spatial receptive fields formed by affine Gaussian kernels and directional derivatives of these.
The corresponding family of receptive fields is closed under general affine transformations of the spatial domain, including translations, rotations, scaling transformations and perspective foreshortening.
Figure 4
Figure 4. Non-causal and space-time separable spatio-temporal receptive fields over 1+1−D space-time as generated by the Gaussian spatio-temporal scale-space model with .
This family of receptive fields is closed under rescalings of the spatial and temporal dimensions. (Horizontal axis: space formula image. Vertical axis: time formula image.)
Figure 5
Figure 5. Non-causal and velocity-adapted spatio-temporal receptive fields over 1+1-D space-time as generated by the Gaussian spatio-temporal scale-space model for a non-zero image velocity
formula image. This family of receptive fields is closed under rescalings of the spatial and temporal dimensions as well as Galilean transformations. (Horizontal axis: space formula image. Vertical axis: time formula image.)
Figure 6
Figure 6. Time-causal and space-time separable spatio-temporal receptive fields over a 1+1−D space-time as generated by the time-causal spatio-temporal scale-space model with .
This family of receptive fields is closed under rescalings of the spatial and temporal dimensions. (Horizontal axis: space formula image. Vertical axis: time formula image.)
Figure 7
Figure 7. Time-causal and velocity-adapted spatio-temporal receptive fields over a 1+1−D space-time as generated by the time-causal spatio-temporal scale-space model with
formula image. This family of receptive fields is closed under rescalings of the spatial and temporal dimensions as well as Galilean transformations. (Horizontal axis: space formula image. Vertical axis: time formula image.)
Figure 8
Figure 8. Spatial component of receptive fields in the LGN.
(left) Receptive fields in the LGN have approximately circular center-surround responses in the spatial domain, as reported by DeAngelis et al. . (right) In terms of Gaussian derivatives, this spatial response profile can be modelled by the Laplacian of the Gaussian formula image, here with formula image.
Figure 9
Figure 9. Spatial component of receptive fields in V1.
(left) Simple cells in the striate cortex do usually have strong directional preference in the spatial domain, as reported by DeAngelis et al. . (right) In terms of Gaussian derivatives, first-order directional derivatives of anisotropic affine Gaussian kernels, here aligned to the coordinate directions formula image and here with formula image and formula image, can be used as a model for simple cells with a strong directional preference.
Figure 10
Figure 10. Affine Gaussian receptive fields generated for a set of covariance matrices that correspond to an approximately uniform distribution on a hemisphere in the 3-D environment, which is then projected onto a 2-D image plane.
(left) Zero-order receptive fields. (right) First-order receptive fields.
Figure 11
Figure 11. Non-separable spatio-temporal receptive fields in V1.
(top row) Examples of non-separable spatio-temporal receptive field profiles in the striate cortex as reported by DeAngelis et al. : (top left) a receptive field reminiscent of a second-order derivative in tilted space-time (compare with the left column in figure 11) (top right) a receptive reminiscent of a third-order derivative in tilted space-time (compare with the right column in figure 11). (middle and bottom rows) Non-separable spatio-temporal receptive fields obtained by applying velocity-adapted second- and third-order derivative operations in space-time to spatio-temporal smoothing kernels generated by the spatio-temporal scale-space concept. (middle left) Gaussian spatio-temporal kernel formula image with formula image. (middle right) Gaussian spatio-temporal kernel formula image with formula image. (lower left) Time-causal spatio-temporal kernel formula image with formula image. (lower right) Time-causal spatio-temporal kernel formula image with formula image. (Horizontal dimension: space formula image. Vertical dimension: time formula image.)
Figure 12
Figure 12. Illustration of how scale selection can be performed from receptive field responses by computing scale-normalized Gaussian derivative operators at different scales and then detecting local extrema over scale.
Here, so-called scale-space signatures have been computed at the centers of two different lamps at different distances to the observer. Notice how the local extrema over scale are assumed at coarser scales for the nearby lamp than for the distant lamp. When measured in units of dimension length, the ratio between these scale estimates agrees with the ratio between the sizes of the projected lamps in the image domain.
Figure 13
Figure 13. Illustration of how scale normalization can be performed by rescaling local image structures using scale information obtained from a scale selection mechanism.
Here, the two windows selected in figure 12 have been transformed to a common scale-invariant reference frame by normalizing them with respect to the scale levels at which the scale-normalized Laplacian and the scale-normalized determinant of the Hessian respectively assumed their global extrema over scale. Note the similarities of the resulting scale normalized representations, although they correspond to physically different objects in the world.
Figure 14
Figure 14. Illustration of how affine invariance can be achieved by normalization to an affine invariant reference frame determined from a second-moment matrix.
The left column shows three views of a wall at Moderna Museet in Stockholm with different amount of perspective foreshortening due to variations in the viewing direction relative to the surface normal of the wall. The right column shows the result of performing affine normalization of a window in each image independently (with the windows centered at corresponding image points on the wall) using a series of affine transformations proportional to formula image until an affine invariant fixed-point of (81) has been reached. Notice how this leads to a major compensation for the perspective foreshortening effects, which can be used for significantly improving the performance of methods for image matching and object recognition under perspective projection. With regard to receptive fields, the use of an affine family of receptive field profiles makes it possible to define image operations in the image domain that are equivalent to the use of receptive fields based on rotationally symmetric smoothing operations in an affine invariant reference frame.
Figure 15
Figure 15. Illustration of how receptive field responses may be affected by unknown relative motions between objects in the world and the observer and of how this effect can be handled by velocity adaptation.
The first row shows space-time traces of a walking person taken with (left column) a stabilized camera with the viewing direction following the motion of the person and (right column) a stationary camera with a fixed viewing direction for a video sequence used for the experiments in Laptev and Lindeberg . The second row shows Laplacian receptive field responses computed in the two domains from space-time separable receptive fields without velocity adaptation. In the third row, these receptive field responses from the stationary camera have been space-time warped to the reference frame of the stabilized camera. As can be seen from the data, the receptive field responses are quite different in the two domains, which implies problems if one would try to match them. Hence, spatio-temporal recognition based on space-time separable receptive fields only can be a rather difficult problem. In the fourth row, the receptive field responses have instead been computed with regional velocity adaptation that aligns the space-time orientation of the receptive fields to a regional velocity estimate. In the fifth row, the velocity-adapted receptive responses from the stationary camera have been space-time warped to the reference frame of the stabilized camera. As can be seen from a comparison with the corresponding result obtained for the non-adapted receptive field responses in the third row, the use of velocity adaptation implies a better stability of receptive field responses under unknown relative motions between objects in the world and the observer. For simplicity of illustration, the velocity estimates used for velocity adaptation have here been computed regionally over a central region of the spatio-temporal volume containing the spatio-temporal gait pattern. In Laptev and Lindeberg a corresponding local method for velocity adaptation is presented, where the velocity estimates for velocity adaptation are instead computed locally from extremum responses of Laplacian receptive field responses over different image velocities and spatio-temporal scales for each point in space-time.
Figure 16
Figure 16. Illustration of the effect of computing Laplacian receptive field responses from image intensities defined on (left column) a linear intensity scale vs. (right column) a logarithmic intensity scale for an image with substantial illumination variations.
As can be seen from the figure, the magnitudes of the Laplacian receptive field response are substantially higher in the left sunlit part of the house compared to the right part in the shade if the Laplacian responses are computed from a linear luminosity scale, whereas the difference in amplitude is between the left and the right parts of the house becomes substantially lower if the receptive field responses are computed from a logarithmic intensity scale.
Figure 17
Figure 17. Schematic overview of how the covariance properties of the receptive fields in the proposed receptive field model lead to covariant image measurements, from which truly invariant image representations can then be obtained by complementary selection mechanisms that operate over the parameters of the receptive fields corresponding to variations over scale, affine image deformations and Galilean motions.
For pure scaling transformations, the parameter formula image of the receptive fields will be a scalar scale parameter, whereas a covariance matrix formula image is needed to capture more general affine image deformations. For spatio-temporal image data, an additional temporal scale parameter formula image and an additional image velocity parameter formula image are furthermore needed.

Similar articles

See all similar articles

Cited by 2 PubMed Central articles

References

    1. Biederman I, Cooper EE (1992) Size invariance in visual object priming. Journal of Experimental Physiology: Human Perception and Performance 18: 121–133.
    1. Logothetis NK, Pauls J, Poggio T (1995) Shape representation in the inferior temporal cortex of monkeys. Current Biology 5: 552–563. - PubMed
    1. Ito M, Tamura H, Fujita I, Tanaka K (1995) Size and position invariance of neuronal responses in monkey inferotemporal cortex. Journal of Neurophysiology 73: 218–226. - PubMed
    1. Furmanski CS, Engel SA (2000) Perceptual learning in object recognition: Object specificity and size invariance. Vision Research 40: 473–484. - PubMed
    1. Hung CP, Kreiman G, Poggio T, DiCarlo JJ (2005) Fast readout of object indentity from macaque inferior temporal cortex. Science 310: 863–866. - PubMed

Publication types

Grant support

Funding was received from The Swedish Research Council contract 2010–4766; The Royal Swedish Academy of Sciences; and The Knut and Alice Wallenberg foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Feedback