Abstract
Generating a detailed percept of the three-dimensional (3D) scene underlying any visual stimulus is the most important task of natural vision. It has long been suggested that the visual system uses a set of modules to derive 3D information via complex image-based processing. However, missing in these approaches is a full understanding of the extraordinarily complex 3D natural scene statistics. In this work, we acquired a large set of high-resolution 3D natural scenes and examined the statistics of 3D natural scenes. We first sampled a large number of scene patches (~2 degrees of visual angle) from the database and fitted the 3D data in the patches to a concatenation of 8th order polynomial functions. We found all the 3D natural scene patches that had distinctive distributions of ranges (referred to as 3D natural scene structures). Two 3D scene patches were deemed to have the same distributions of ranges if they can be transformed to each other by an affine transform (displacement, rotation, and scaling). The rationale is to remove the variations in ranges due to uniform changes in viewing angles and surface shapes. Finally, we examined the occurring frequencies of these structures and their compositional patterns in natural scenes and developed a probabilistic model for each of them. To demonstrate the utilities of these 3D natural scene structures, we used them to estimate 3D scenes from 2D images and to categorize 3D natural scenes. Our results showed that accurate 3D vision from a single monocular view is achievable in many situations and that near-human performance can be achieved on categorizing 3D natural scenes with the obtained structures. We thus conclude that the 3D natural scene structures obtained here capture faithfully the extraordinarily complex 3D natural scene statistics in a way that supports a range of tasks of natural vision.
Meeting abstract presented at VSS 2012