Abstract
Estimating range (egocentric distance) is a difficult problem generally approached by using a combination of well-known depth cues. Here, we take a different approach that exploits statistical relationships between local image structure and depth structure. First, we collected 12 high-resolution RGB images co-registered with precise range data using a combination RIEGL VZ-400 range scanner and a Nikon D700 camera. We subdivided these images and the corresponding range maps into 16 x 16 patches. The range patches were pre-processed (normalization, gradient extraction) and then clustered (k-means) to form a "dictionary" of canonical range structures. Then, 11 of the images were used to train an optimal classifier to associate image patches (or, more precisely, the corresponding outputs of a V1-like wavelet filter bank) to these canonical range structures. The 12th range map was then reconstructed by, first, having the Bayesian classifier select the most likely range patch for each image patch and, second, stitching the range patches together via simple averaging. The reconstructed range maps look remarkably like the actual range maps given that no conventional depth cues (including binocular disparity) were used. Quantitative comparison shows that the estimated range values correlate to the ground-truth range values fairly well. This implies that a biological visual system might be able to make a coarse estimate of range structure in the environment using the retinal image at hand and associations between image structure and true range structure, associations that could be learned via interaction with the environment over either developmental or evolutionary timescales.
Meeting abstract presented at VSS 2012