Abstract
Estimating the 3D shape of objects is a critical task for sighted organisms. Thus, it is important to understand how different image cues should be combined for optimal shape estimation. Here, we examine how gradients of various image cues—disparity, luminance, texture—should be combined to estimate surface tilt in natural scenes. Surface tilt is the direction in which a surface is receding most rapidly; for example, the ground plane straight-ahead has a surface tilt of 90 deg. Estimating surface tilt is necessary for recovering surface orientation and 3D shape. To determine how image cues to surface tilt should be optimally combined, we collected a database of stereoscopic natural images with precisely registered range images, using a robotically positioned DSLR camera and laser range scanner. For each pixel in each registered image (~109 samples) we computed the gradients of range, disparity, luminance, and texture within a local area (0.6 deg). Then, we computed the conditional mean of the range-gradient orientation (the ground-truth surface tilt), given the orientations of the image gradients. These conditional means are the Bayes optimal (MMSE) estimates of the surface tilt given the image cues, and are free of assumptions about the shapes of the underlying joint probability distributions. A rich set of results emerges. First, the prior probably distribution over surface tilts in natural scenes exhibits a strong cardinal bias. Second, the likelihood distributions for disparity, luminance, and texture are each somewhat biased estimators of surface tilt. Third, the optimal estimates of surface tilt are more biased than the likelihoods, indicating a strong influence of the prior. Fourth, when all three image cues agree, the optimal estimates become nearly unbiased. Fifth, when the luminance and texture cues agree they often override disparity in the estimate of surface tilt, but when they disagree, they have little effect.
Meeting abstract presented at VSS 2014