Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 16 (13), 2

Estimating 3D Tilt From Local Image Cues in Natural Scenes

Affiliations

Estimating 3D Tilt From Local Image Cues in Natural Scenes

Johannes Burge et al. J Vis.

Abstract

Estimating three-dimensional (3D) surface orientation (slant and tilt) is an important first step toward estimating 3D shape. Here, we examine how three local image cues from the same location (disparity gradient, luminance gradient, and dominant texture orientation) should be combined to estimate 3D tilt in natural scenes. We collected a database of natural stereoscopic images with precisely co-registered range images that provide the ground-truth distance at each pixel location. We then analyzed the relationship between ground-truth tilt and image cue values. Our analysis is free of assumptions about the joint probability distributions and yields the Bayes optimal estimates of tilt, given the cue values. Rich results emerge: (a) typical tilt estimates are only moderately accurate and strongly influenced by the cardinal bias in the prior probability distribution; (b) when cue values are similar, or when slant is greater than 40°, estimates are substantially more accurate; (c) when luminance and texture cues agree, they often veto the disparity cue, and when they disagree, they have little effect; and (d) simplifying assumptions common in the cue combination literature is often justified for estimating tilt in natural scenes. The fact that tilt estimates are typically not very accurate is consistent with subjective impressions from viewing small patches of natural scene. The fact that estimates are substantially more accurate for a subset of image locations is also consistent with subjective impressions and with the hypothesis that perceived surface orientation, at more global scales, is achieved by interpolation or extrapolation from estimates at key locations.

Figures

Figure 1
Figure 1
Definition of slant and tilt. (A) Slant is the angle of rotation out of the reference plane (e.g., fronto-parallel plane). (B) Tilt is the orientation of the surface normal projected into the reference plane. It is always orthogonal to the axis about which the surface is rotated. (C) Slant and tilt together define a unique 3D surface orientation. The joint slant-tilt vector defines a point on the surface of a unit sphere. Different conventions exist for representing surface orientation. In this plot, we show tilts on [0 180) and slants on [−90 90). Other conventions represent tilt on [0 360) and slants on [0 90).
Figure 2
Figure 2
Registered range and camera images. (A) Camera and laser range scanner mounted on portable four-axis robotic gantry: (A) natural scene, (B) Nikon D700 DSLR camera, (C) Riegl VZ-400 3D range scanner, (D) custom robotic gantry. (B) A 200 × 200 pixel patch of a camera image of stone structure mapped onto the range image. The registration is generally within plus or minus one pixel. Note that the shadows in the camera image coincide with the mortared seams in the stone structure.
Figure 3
Figure 3
Registered stereo pairs of camera images (top) and range scans (bottom). The gray scale in the bottom row indicates the range; white pixels indicate no data. Cross-fuse the left two images, or divergently fuse the right two images to see the stereo-defined depth. The yellow rectangle indicates the image regions that were used for analysis. The range data were collected with a cylindrical projection surface. The missing bits of range data in the upper right and left corners of range scans result from the geometric procedures that were required to co-register the camera image and range scans.
Figure 4
Figure 4
Thumbnails of the 96 images in the data set. (A) Camera images. (B) Co-registered range images. Only the left image of each stereo-pair is shown.
Figure 5
Figure 5
Range and photographic stereo images and range and image gradients for tilt estimation. (A) Range stereo images. Light gray scales correspond to larger distances. Divergently fused the left to images, or cross-fuse the right two images. (B) Co-registered photographic stereo images. (C) Ground-truth range data, x and y components of the range gradient, and ground-truth tilts. The small yellow circle indicates the approximate size of the gradient operator (i.e., analysis window). (D) Luminance image data, and x and y components of the disparity gradient (only left eye image shown), luminance gradient, and texture gradient.
Figure 6
Figure 6
Tilt estimation errors. (A) Ground-truth tilt for an example image (cf. Figure 5). (B) Ground-truth slant. Note that the gradient operators used to obtain estimates of ground-truth 3D orientations tend to overestimate slant of pixels near depth boundaries. This effect can be seen in the image regions abutting the foreground tree. (C) Optimal (MMSE) tilt estimates when all three cues are present. (D) Errors in tilt estimates. Tilt errors increase in magnitude as the slant approaches zero. (E) Median absolute tilt estimation errors as a function of ground-truth tilt and slant. For slants near zero, where tilt is undefined, tilt errors are large. Beyond approximately 20°, the pattern of tilt errors becomes nearly invariant to slant. (F) Tilt error as a function of ground-truth slant. As slant increases, tilt estimation error decreases systematically. The solid curve is for an analysis area with a diameter of 0.25°, the analysis area used throughout the rest of the article. At slants greater than 40°, the median tilt estimation error drops to approximately 15°.
Figure 7
Figure 7
Tilt prior in natural scenes and optimal single-cue tilt estimates, variance, error, and bias. Three-cue performance is also shown for when all three cues agree. (A) Unsigned tilt prior in natural scenes. The tilt prior exhibits a strong cardinal bias. Slants about horizontal axes (tilt = 90°) are most probable (e.g., the ground plane straight ahead). Slants about vertical axes (tilt = 0° and 180°) are the next most probable. All other tilts are much less probable. (B) Distribution of optimal tilt estimates. Its shape is similar to the shape of the tilt prior. (C) Tilt estimates conditioned on individual image cue values and estimates conditioned on cases when all three cues agree. Specifically, blue indicates tilt given disparity alone E(ϕr|ϕd), green indicates tilt given luminance alone E(ϕr|ϕl), red indicates tilt given texture alone E(ϕr|ϕt), and black indicates the expected tilt value when all three cues agree. (D) The precision of the optimal estimates. Disparity alone yields the most reliable estimates for most, but not all, image cue values. When all three image cues agree, the precision of the optimal estimate is significantly increased (see Methods). (E, F) Median absolute error (magnitude) and bias of estimates as a function of image cue value. When all three image cues agree, there is a substantial increase in precision and a decrease in bias.
Figure 8
Figure 8
Two cue optimal estimates and precision. (A) Optimal tilt estimates given disparity and luminance cue values: E(ϕr|ϕd,ϕl). Each line segment indicates the optimal tilt estimate (i.e., the expected tilt value). (B) Expected tilt (replotted from A) as a function of the disparity cue for different luminance cue values (see upper left inset). Specifically, when luminance and disparity cues always agree with each other E(ϕr|ϕd = ϕl), when luminance always equals 90° E(ϕr|ϕd,ϕl = 90), and when luminance and disparity cues differ by 90° E(ϕr||ϕd, − ϕl| = 90). (C) Expected tilt (also replotted from A) but as a function of the luminance cue (see lower right inset in B) for different disparity cue values. When the disparity cue equals 90°, luminance has almost no influence on the optimal estimate (disparity dominance). (D) Estimate precision (circular variance) based on measured disparity and luminance cue values. (Inset: Von mises distributions spanning the range of depicted circular variances.) (E, F) Circular variances for the same conditions as in B, C.
Figure 9
Figure 9
Three cue optimal estimates and precision, for all values of luminance and texture when A disparity equals 45°, B disparity equals 90°, C disparity equals 135°, and D disparity equals 180°. Top row: The cue cube indicates the plane from which the optimal estimates are shown. Middle row: Line segment orientation indicates the tilt estimate given each particular combination of cue values. Bottom row: Circular variance of optimal estimates. The color bar is the same as in Figure 8D (i.e., circular variance on [0.1 1.0]).
Figure 10
Figure 10
Three cue estimates (replotted from Figure 9) for specific combinations of luminance and texture when (A) disparity equals 45°, (B) disparity equals 90°, (C) disparity equals 135°, (D) disparity equals 180° (similar to Figure 9). Top row: Surface tilt estimates, for each disparity cue value, when luminance equals texture (black), luminance disagrees with texture by 90° (middle gray), and luminance equals 90° (light gray). For reference, light blue indicates the optimal estimate for conditioned on disparity alone (see above), while light green indicates the optimal estimate conditioned on luminance alone when luminance equals 90°, E(ϕr|ϕl = 90°). Bottom row: Circular variance for the same conditions.
Figure 11
Figure 11
The influence of luminance and texture on tilt estimates from disparity. The difference between the all-cues estimates and disparity-alone estimates is plotted as a heat map. (A) Dramatic departures from the disparity alone estimate occur when luminance and texture agree and differ from disparity (except when disparity equals 90°). (B–E) As the difference between luminance and texture increases from (B) 22.5°, (C) 45.0°, (D) 67.5°, and (E) 90.0°, the influence of luminance and texture progressively decreases. When luminance and texture are in maximal disagreement, |ϕlϕt| = 90°, they have little or no effect; that is, the three-cue estimates are almost the identical to the disparity-alone estimate.
Figure 12
Figure 12
Comparison of the optimal conditional-means method and other estimators. (A) Grand histogram of errors for the optimal (black), luminance gradient cue only (green), texture gradient cue only (red), disparity gradient cue only (blue), linear reliability–based cue combination (dashed black), linear reliability–based cue combination with local disparity-specified distance as the auxiliary cue (dashed cyan), and linear reliability–based cue combination with local RMS contrast as the auxiliary cue (dashed cyan). (B) Mean absolute error as a function of ground-truth tilt. (C) Mean absolute error as a function of range slant (cf. Figure 6F). (D) Median tilt bias as a function of range tilt. (E) Median tilt bias as a function of range slant. To reduce clutter, the single-cue results are not shown in B and E.
Figure 13
Figure 13
Relative reliability of each individual gradient cue, averaged across tilt, as a function of different local auxiliary cues. (A) Disparity-specified distance. (B) RMS contrast. (C) Luminance. The averages across tilt are simply for purposes of illustrating broad trends. (D) Variance of each individual gradient cue estimator across tilt for different disparity-specified distances. The average relative reliability in A is obtained by computing the average inverse variance across tilt at a given disparity-specified distance.
Figure 14
Figure 14
Tilt estimates when the luminance and texture cues are equal and the disparity cue signals a tilt of 45°. If the disparity cue is vetoed (ignored), the estimates should fall on the dashed line. The black curve shows the MMSE optimal estimates, which largely veto disparity when luminance and texture agree. The dashed black curve shows the estimates based on linear cue combination (the LR estimator). For the LR estimator, disparity pulls the estimates in the direction of 45° when luminance/texture cue is in the range of 70° to 130°.
Figure 15
Figure 15
Slant-tilt prior in natural scenes, for two equivalent parameterizations of slant and tilt. Upper row: tilt = [0 180), slant = [−90 90); lower row: tilt = [0 360), slant = [0 90). A joint prior distribution of slant-tilt in natural scenes. The color bar indicates the count in log10 (e.g., 5 indicates a count of 105); some slant-tilt combinations are ∼100× more likely than others. High slants at tilt = 90° (e.g., ground plane straight ahead) are most probable. Slant zero surfaces are also quite probable (where tilt is undefined). (B) The marginal tilt prior distribution. (The upper plot is exactly the same data as Figure 7B). (C) The marginal slant prior distribution. The dashed black curve is a mixture of Gaussians fit to the slant prior (see Appendix for parameters). The gray curve is the marginal slant distribution computed without an area-preserving projection. The shaded areas (|slant| > 67.5°) indicate results that may be due to depth discontinuities rather than the surfaces of individual objects.
Figure 16
Figure 16
Comparison of texture cues for tilt estimation. (A, B) Synthesized image of a planar textured surface with a slant of 60° and a tilt of 90°. (C) Map of the tilt estimated at each pixel location in B for the major axis cue used here. The map is all white because the major axis of the spectrum is 90° at all locations. (D) Map of the tilt estimated at each pixel location in B for the frequency centroid gradient cue. (E, F) Histograms of the tilt estimates for the two cues. (G) Map of ground-truth tilt at each pixel location of an example natural image. (H) Map of estimated tilts for example natural image using major axis cue. (I) Map of estimated tilts for example image using frequency centroid gradient cue.
Figure A1
Figure A1
Geometry of circular variables. To compute the average from samples of a circular variable, the (four-quadrant) arc tangent is computed from the average cosine and the average sine of the sample angles. The angle of the average resultant vector is the mean angle and one minus the magnitude of the resultant vector is the circular variance. Plotted are samples from two distributions with different means and circular variances (black and gray symbols).
Figure A2
Figure A2
Range estimates from disparity. (A) Histogram of range from disparity estimates against ground-truth range. The color bar indicates the log-base-10 number of samples in each bin. The fact that nearly all the samples are on the positive oblique indicates that the disparity estimation routine (Equation A7) is largely accurate. (B) Mean (solid black curve) and median (black dashed curve) range estimates from disparity as a function of distance. Error bars show 68% confidence intervals of the mean.

Similar articles

See all similar articles

Cited by 10 PubMed Central articles

See all "Cited by" articles

References

    1. Alais D,, Burr D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–262, http://doi.org/10.1016/j.cub.2004.01.029. - DOI - PubMed
    1. Backus B. T,, Banks M. S. (1999). Estimator reliability and distance scaling in stereoscopic slant perception. Perception, 28, 217–242. - PubMed
    1. Banks M. S,, Gepshtein S,, Landy M. S. (2004). Why is spatial stereoresolution so low? Journal of Neuroscience, 24, 2077–2089. Retrieved from http://doi.org/10.1523/JNEUROSCI.3852-02.2004 - DOI - PMC - PubMed
    1. Blake A,, Bulthoff H. H,, Sheinberg D. (1993). Shape from texture: Ideal observers and human psychophysics. Vision Research, 33, 1723–1737. - PubMed
    1. Burge J,, Fowlkes C. C,, Banks M. S. (2010a). Natural-scene statistics predict how the figure-ground cue of convexity affects human depth perception. Journal of Neuroscience, 30, 7269–7280. Retrieved from http://doi.org/10.1523/JNEUROSCI.5551-09.2010 - DOI - PMC - PubMed
Feedback