Abstract
Natural visual scenes consist of objects of various physical properties that are arranged in three-dimensional (3D) space in a variety of ways. The higher-order statistics of this environment we interact with daily is crucial for our understanding of 3D vision, space navigation, space memory, and the underlying neural mechanisms. We acquired a database of natural scenes that include both distance and luminance information. To examine the statistics of 3D natural scenes, we transformed the range images into XYZ images in the Cartesian coordinate and sampled a large number of patches from the XYZ images. We obtained the independent components (ICs) of the sampled patches, fitted 3D surfaces to the ICs, and examined the probability distribution of distances, orientations, and curvatures of the fitted 3D ICs. Using these 3D ICs, we compiled a large set of 3D natural scene structures (NSSs), i.e., spatial concatenations of 3D ICs, and used them to categorize 3D natural scenes. To this end, we cropped 1,000 patches from the large-scale 3D natural scenes and divided them evenly into 10 classes. We then calculated the information content of each 3D NSS, i.e., the rate at which the 3D scenes in the training set were recognized based on the NSS, and selected a set of NSSs that had high information content for each class of 3D scenes. Finally, we used the selected 3D NSSs to categorize natural scenes. We found that: 1) this model achieved a high categorization performance; 2) NSS are more efficient than simple features in conveying information about scene identities and scene categories; and 3) spatial concatenations of individual NSSs and multiple NSSs are important for 3D scene categorization. Thus, we concluded that the 3D NSSs obtained here are a useful set of code words for encoding complex 3D natural scenes.
Meeting abstract presented at VSS 2013