Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 8;7(1):ENEURO.0411-19.2019.
doi: 10.1523/ENEURO.0411-19.2019. Print 2020 Jan/Feb.

Optimized but Not Maximized Cue Integration for 3D Visual Perception

Affiliations

Optimized but Not Maximized Cue Integration for 3D Visual Perception

Ting-Yu Chang et al. eNeuro. .

Abstract

Reconstructing three-dimensional (3D) scenes from two-dimensional (2D) retinal images is an ill-posed problem. Despite this, 3D perception of the world based on 2D retinal images is seemingly accurate and precise. The integration of distinct visual cues is essential for robust 3D perception in humans, but it is unclear whether this is true for non-human primates (NHPs). Here, we assessed 3D perception in macaque monkeys using a planar surface orientation discrimination task. Perception was accurate across a wide range of spatial poses (orientations and distances), but precision was highly dependent on the plane's pose. The monkeys achieved robust 3D perception by dynamically reweighting the integration of stereoscopic and perspective cues according to their pose-dependent reliabilities. Errors in performance could be explained by a prior resembling the 3D orientation statistics of natural scenes. We used neural network simulations based on 3D orientation-selective neurons recorded from the same monkeys to assess how neural computation might constrain perception. The perceptual data were consistent with a model in which the responses of two independent neuronal populations representing stereoscopic cues and perspective cues (with perspective signals from the two eyes combined using nonlinear canonical computations) were optimally integrated through linear summation. Perception of combined-cue stimuli was optimal given this architecture. However, an alternative architecture in which stereoscopic cues, left eye perspective cues, and right eye perspective cues were represented by three independent populations yielded two times greater precision than the monkeys. This result suggests that, due to canonical computations, cue integration for 3D perception is optimized but not maximized.

Keywords: 3D visual perception; canonical computations; divisive normalization; optimal cue integration; perspective; stereoscopic.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
3D cue reliabilities depend on object pose. A, Stereoscopic cue reliability decreases with distance. Equivalent changes in object distance produce smaller retinal image changes at greater distances. This is illustrated with an observer fixating the black dot. The distance between black and magenta dots (ΔBM) is equal to the distance between magenta and cyan dots (ΔMC), but the retinal change is larger for ΔBM than ΔMC. B, The reliability of perspective cues increases with orientation-in-depth (slant). Equivalent slant changes produce larger changes in the rate at which parallel lines converge in the 2D projection at larger base slants. This is illustrated with a checkerboard rotated about the horizontal axis passing through the red dot. Colored lines are parallel in the world. A 20° slant (S) rotation produces a smaller perspective change between 0° and 20° (top row) than between 40° and 60° (bottom row).
Figure 2.
Figure 2.
Stimuli and discrimination task. A, Tilt (T) and slant (S) are polar coordinates describing planar surface orientation. Tilt specifies the direction that the plane is oriented in depth. Slant specifies how much it is oriented in depth. B–E, Example planes (T = 270°, S = 60°). For clarity, the dot size is exaggerated and the dot number is reduced from the actual experiments. B, Combined-cue stimulus at 57 cm (fixation distance). C, Combined-cue stimulus at 77 cm (all dots behind the plane of fixation). D, Stereoscopic cue stimulus at 57 cm. E, Perspective cue stimulus at 57 cm (left eye presentation). F, Eight alternative tilt discrimination task. Fixation was held on a target presented at 57 cm (screen distance) for 300 ms. A plane then appeared for 1000 ms. Fixation was then held for 500–1500 ms before the fixation target disappeared and eight choice targets appeared. The plane’s tilt was reported through a saccade to a choice target. For example, the bottom target for a bottom-near plane (T = 270°). Planes are illustrated here using red-green anaglyphs.
Figure 3.
Figure 3.
Tilt perception for combined-cue stimuli. A, Probability density functions describing the errors in reported tilts made by Monkey L for each slant–distance combination, calculated using all eight tilts. Columns correspond to distance, and colors to slant. The probability that an error of a given ΔTilt was made is shown with a point. Correct choices: ΔTilt = 0°. Solid curves are fitted von Mises probability density functions. At high precisions, there is some deviation between the point representing the probability that the monkey was correct and the probability density function. This deviation reflects discrete versus continuous representations of the area between sampled tilts, and that the sampling interval limits the maximum κ that can be estimated. An upper bound of κ = 18 was set based on simulations (see Materials and Methods). B, Heat maps showing the precision (von Mises κ) of tilt perception as a function of slant and distance for both monkeys, calculated using all eight tilts. Red hues indicate lower precision and yellow hues indicate higher precision. Right marginals show κ as a function of slant for each distance. Precision increased monotonically with slant. Upper marginals show κ as a function of distance for each slant. Precision had an inverted U shape as a function of distance. Also see Extended Data Figure 3-1.
Figure 4.
Figure 4.
Precision of tilt perception for cue-isolated stimuli. A, Stereoscopic cues. Precision (κ) increased monotonically with slant and had an inverted U shape as a function of distance. Performance was at chance level for combinations of small slants and large distances (outlined in black). B, Perspective cues. Precision increased monotonically with slant and was independent of distance. Plots follow the format in Figure 3B. Also see Extended Data Figure 3-1.
Figure 5.
Figure 5.
Stereoscopic cue controls. A, Stereoscopic cue stimuli were viewed binocularly (blue curves) or monocularly (left eye stimulated: orange; right eye stimulated: yellow). Probability density functions describing the errors in reported tilts made by each monkey, calculated using all eight tilts are plotted. The probability that an error of a given ΔTilt was made is shown with a point. Correct choices: ΔTilt = 0°. Solid curves are fitted von Mises probability density functions. Chance performance is indicated by dashed black lines. B, Precision (κ) versus dot number for planes at 57 cm (Monkey L: purple; Monkey F: red) and 97 cm (Monkey F: green). Error bars are SEM across sessions.
Figure 6.
Figure 6.
Biases in tilt perception occurred at low precisions and were consistent with a prior over 3D tilt. A–C, Perceived tilt (presented tilt + mean of the von Mises fit to the error distribution) versus presented tilt (Monkey L: N = 24 slant–distance combinations per tilt; Monkey F: N = 32 per tilt). Diagonals are identity lines. Greater vertical distance from the identity line indicates greater bias. The fill opacity indicates precision (κ). Asterisks mark biases that were significantly different from 0°. A, Combined-cue stimuli. B, Stereoscopic cue stimuli. At low precisions, perception was pulled toward 270° (bottom-near), marked by horizontal dashed lines. C, Perspective cue stimuli. D, Priors over 3D tilt. The angular variable is surface tilt and the radial variable is the probability density value. Shading indicates the bootstrapped 95% confidence interval.
Figure 7.
Figure 7.
Perceptual cue integration. A–E, Example densities for each cue condition, calculated using all eight tilts. Solid curves are von Mises fits. Dotted black curves are optimal predictions. Insets show cue-isolated κ ratios. A, Slant = 30°, distance = 77 cm (Monkey F). B, Slant = 30°, distance = 87 cm (Monkey F). C, Slant = 30°, distance = 107 cm (Monkey L). D, Slant = 45°, distance = 77 cm (Monkey L). E, Slant = 15°, distance = 137 cm (Monkey F). Combined-cue perception depended entirely on perspective cues. F, Distribution of cue-isolated κ ratios (N = 56 slant–distance combinations, both monkeys). The triangle marks the mean ratio. G, Optimal versus observed combined-cue precision calculated using all eight tilts for each slant–distance combination (N = 56). H, Optimal versus observed combined-cue precision calculated for each tilt × slant × distance combination (N = 448). Type-II regression lines are shown in yellow (κ = 18 excluded). Insets show correlations and regression line equations. Also see Extended Data Figure 3-1.
Figure 8.
Figure 8.
Optimized but not maximized cue integration. A, Schematics of three architectures for combining responses to stereoscopic cues (rS), left eye perspective cues (rPL), and right eye perspective cues (rPR). Top, Three independent populations represent each cue (orange). Middle, Two independent populations represent stereoscopic cues and perspective cues from both eyes (green). Bottom, One population represents all cues (magenta). Right, Combined-cue representations for each architecture. Points show the response of each model neuron, ordered along the x-axis by preferred tilt, to a single stimulus presentation (slant = 30°, distance = 37 cm). B, Tilt posteriors, p(T|r), decoded from the combined-cue representations. Black dots show corresponding data from Monkey L. Given the same cue-isolated responses, precision was greatest for the three independent populations model and lowest for the one population model. The posterior of the two independent populations model matched the monkey’s data. C, Comparisons of decoded model precisions and observed monkey precisions. Each point corresponds to one slant–distance combination (N = 56, both monkeys). The three independent populations model was more precise than the monkeys (nearly all points are above the dashed black identity line). The precisions from the two independent populations model matched the monkeys’ precisions (points are distributed along the identity line). The one population model was less precise than the monkeys (nearly all points are below the identity line). Solid lines are Type-II regressions (κ = 18 excluded).

Similar articles

Cited by

References

    1. Adams WJ, Elder JH, Graf EW, Leyland J, Lugtigheid AJ, Muryy A (2016) The Southampton-York natural scenes (SYNS) dataset: statistics of surface attitude. Sci Rep 6:35805. 10.1038/srep35805 - DOI - PMC - PubMed
    1. Alberts BB, de Brouwer AJ, Selen LP, Medendorp WP (2016) A Bayesian account of visual-vestibular interactions in the rod-and-frame task. eNeuro 3:e0093. - PMC - PubMed
    1. Alizadeh AM, Van Dromme I, Verhoef BE, Janssen P (2018) Caudal intraparietal sulcus and three-dimensional vision: a combined functional magnetic resonance imaging and single-cell study. Neuroimage 166:46–59. 10.1016/j.neuroimage.2017.10.045 - DOI - PubMed
    1. Ban H, Welchman AE (2015) fMRI analysis-by-synthesis reveals a dorsal hierarchy that extracts surface slant. J Neurosci 35:9823–9835. 10.1523/JNEUROSCI.1255-15.2015 - DOI - PMC - PubMed
    1. Banks MS, Hooge IT, Backus BT (2001) Perceiving slant about a horizontal axis from stereopsis. J Vis 1:55–79. 10.1167/1.2.1 - DOI - PubMed

Publication types

LinkOut - more resources