Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
, 14 (2)

Optimal Disparity Estimation in Natural Stereo Images

Affiliations

Optimal Disparity Estimation in Natural Stereo Images

Johannes Burge et al. J Vis.

Abstract

A great challenge of systems neuroscience is to understand the computations that underlie perceptual constancies, the ability to represent behaviorally relevant stimulus properties as constant even when irrelevant stimulus properties vary. As signals proceed through the visual system, neural states become more selective for properties of the environment, and more invariant to irrelevant features of the retinal images. Here, we describe a method for determining the computations that perform these transformations optimally, and apply it to the specific computational task of estimating a powerful depth cue: binocular disparity. We simultaneously determine the optimal receptive field population for encoding natural stereo images of locally planar surfaces and the optimal nonlinear units for decoding the population responses into estimates of disparity. The optimal processing predicts well-established properties of neurons in cortex. Estimation performance parallels important aspects of human performance. Thus, by analyzing the photoreceptor responses to natural images, we provide a normative account of the neurophysiology and psychophysics of absolute disparity processing. Critically, the optimal processing rules are not arbitrarily chosen to match the properties of neurophysiological processing, nor are they fit to match behavioral performance. Rather, they are dictated by the task-relevant statistical properties of complex natural stimuli. Our approach reveals how selective invariant tuning-especially for properties not trivially available in the retinal images-could be implemented in neural systems to maximize performance in particular tasks.

Keywords: Bayesian statistics; complex cells; decoding; depth perception; disparity energy model; encoding; hierarchical model; ideal observer; invariance; natural scene statistics; perceptual constancy; population code; selectivity; simple cells; stereopsis.

Figures

Figure 1
Figure 1
Natural scene inputs, disparity geometry, and example left and right eye signals. (a) Animals with front facing eyes. (b) Example natural images used in the analysis. (c) Stereo geometry. The eyes are fixated and focused at a point straight ahead at 40 cm. We considered retinal disparity patterns corresponding to fronto-parallel and slanted surfaces. Non-planar surfaces were also considered (see Discussion). (d) Photographs of natural scenes are texture mapped onto planar fronto-parallel or slanted surfaces. Here, the left and right eye retinal images are perspective projections (inset) of a fronto-parallel surface with 5 arcmin of uncrossed disparity. Left and right eye signals are obtained by vertically averaging each image; these are the signals available to neurons with vertically oriented receptive fields. The signals are not identical shifted copies of each other because of perspective projection, added noise, and cosine windowing (see text). We note that across image patches there is considerable signal variation due to stimulus properties (e.g., texture) unrelated to disparity. A selective, invariant neural population must be insensitive to this variation.
Figure 2
Figure 2
Hierarchical processing steps in optimal disparity estimation. (a) The photoreceptor responses are computed for each of many natural images, for each of many different disparities. (b) The optimal filters for disparity estimation are learned from this collection of photoreceptor responses to natural stimuli. These are the eight most useful vertically oriented filters (receptive fields) (see also Figure 3a). Left and right eye filter weights are shown in gray. Filter responses are given by the dot product between the photoreceptor responses and filter weights. (c) The optimal selective, invariant units are constructed from the filter responses. Each unit in the population is tuned to a particular disparity. These units result from a unique combination of the optimal filter responses (see also Figure 7). (d) The optimal readout of the selective, invariant population response is determined. Each black dot shows the response of one of the disparity-tuned units in (c) to the particular image shown in (a). The peak of the population response is the optimal (MAP) estimate. Note that r, R, and RLL are vectors representing the photoreceptor, filter, and disparity-tuned-unit population responses to particular stimuli. Red outlines represent the responses to the particular stereo-image patch in (a). The steps in this procedure are general, and will be useful for developing ideal observers for other estimation tasks.
Figure 3
Figure 3
Optimal linear binocular receptive fields for disparity estimation. (a) Spatial receptive fields. Solid lines with closed symbols indicate the left eye filter components. Dashed lines with open symbols indicate right eye filter components. Insets show 2-D versions of the 1-D filters. (b) Luminance spatial frequency tuning versus spatial frequency bandwidth. The filters have bandwidths of approximately 1.5 octaves. (c) Phase and position shift coding for optimal binocular filters (circles) and binocular cells in macaque (squares) (Cumming & DeAngelis, 2001). Phase shifts are expressed in equivalent position shifts. Note that the filters were optimized for the fovea, whereas macaque cells were recorded from a range of different eccentricities.
Figure 4
Figure 4
Joint filter response distributions conditioned on disparity for filters F1 and F2 (see Figure 3a). (a) Joint filter responses to each of the 7,600 image patches in the training set. Different colors and symbols denote different disparity levels. Contours show Gaussian fits to the conditional filter response distributions. The black curves on the x and y axes represent the marginal response distributions, p(R1) and p(R2). (b) Posterior probability distributions, averaged across all stimuli at each disparity level if only filters F1 and F2 are used (dotted curves), and if all eight filter responses are used (solid curves). Using eight AMA filters instead of two increases disparity selectivity. Shaded areas represent 68% confidence intervals on the posterior probabilities; this variation is due to natural stimulus variation that is irrelevant for estimating disparity. Natural stimulus variation thus creates response variability even in hypothetical populations of noiseless neurons.
Figure 5
Figure 5
Accuracy and precision of disparity estimates on test patches. (a) Disparity estimates of fronto-parallel surfaces displaced from fixation using the filters in Figure 3. Symbols represent the median MAP readout of posterior probability distributions (see Figure 4b). Error bars represent 68% confidence intervals on the estimates. Red boxes mark disparity levels not in the training set. Error bars at untrained levels are no larger than at the trained levels, indicating that the algorithm makes continuous estimates. (b) Precision of disparity estimates on a semilog axis. Symbols represent 68% confidence intervals (same data as error bars in Figure 5a). Human discrimination thresholds also rise exponentially as stereo stimuli are moved off the plane of fixation. The gray area shows the hyperacuity region. (c) Sign identification performance as a function of disparity. (d), (e) Same as in (b), (c), except that data is for surfaces with a cosine distribution of slants.
Figure 6
Figure 6
Biologically plausible implementation of selective, invariant tuning: processing schematics and disparity tuning curves for an AMA filter, a model complex cell, and a model log-likelihood (LL) neuron. For comparison, a disparity energy unit is also presented. In all cases, the inputs are contrast normalized photoreceptor responses. Disparity tuning curves show the mean response of each filter type across many natural image patches having the same disparity, for many different disparities. Shaded areas show response variation due to variation in irrelevant features in the natural patches (not neural noise). Selectivity for disparity and invariance to irrelevant features (external variation) increase as processing proceeds. (a) The filter response is obtained by linearly filtering the contrast normalized input signal with the AMA filter. (b) The model complex cell response is obtained by squaring the linear AMA filter response. (c) The response of an LL neuron, with preferred disparity δk, is obtained by a weighted sum of linear and squared filter responses. The weights can be positive/excitatory or negative/inhibitory (see Figure 7). The weights for an LL neuron with a particular preferred disparity are specified by the filter response distribution to natural images having that disparity (Figure 4a, Equations 4, 5, S14). In disparity estimation, the filter responses specify that the weights on the linear filter responses are near zero (see Supplement). (d) A standard disparity energy unit is obtained by simply summing the squared responses of two binocular linear filters that are in quadrature (90° out of phase with each other). Here, we show the tuning curve of a disparity energy unit having binocular linear filters (subunits) with left and right-eye components that are also 90° out of phase with each other (i.e., each binocular subunit is selective for a nonzero disparity).
Figure 7
Figure 7
Constructing selective, invariant disparity-tuned units (LL neurons). (a) Tuning curves for several LL neurons, each with a different preferred disparity. Each point on the tuning curve represents the average response across a collection of natural stereo-images having the same disparity. Gray areas indicate ±1 SD of response due to stimulus-induced response variability. (b) Normalized weights on model complex cell responses (see Supplement, Equation 5, Figure 6c) for constructing the five LL neurons marked with arrows in (a). Positive weights are excitatory (red). Negative weights are inhibitory (blue). On-diagonal weights correspond to model complex cells having linear receptive fields like the filters in Figure 3a. Off-diagonal weights correspond to model complex cells having linear receptive fields like the scaled pairwise sums of the AMA filters (see Figures S3–S5). High spatial frequencies are not useful for estimating large disparities (Figure S7). Thus, the number of strongly weighted complex cells decreases as the magnitude of the preferred disparity increases from zero. (c) LL neuron bandwidth (i.e., full-width at half-height of disparity tuning curves) as a function of preferred disparity. Bandwidth increases approximately linearly with tuning.

Similar articles

See all similar articles

Cited by 27 articles

See all "Cited by" articles

Publication types

Feedback