Purchase this article with an account.
Antonio Torralba, Aude Oliva; Depth perception from familiar scene structure. Journal of Vision 2002;2(7):494. doi: https://doi.org/10.1167/2.7.494.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual ‘size’ of the space. However, observers have no problem in estimating the absolute depth of an image.
To evaluate the ability of subjects to estimate the mean depth of a scene from monocular information we run two experiments: First, ten subjects were asked to evaluate the mean depth of 500 pictures (covering the full range of depths; from close-up views of objects to panoramic views of large open spaces). For man-made scenes, the percent of pictures with an error in the interval of one decade with respect to the mean was 85%. For natural scenes, results were poorer: 67%. Second, we evaluate depth estimation in a ranking task. Ten subjects were asked to order 20 sets of 9 pictures each according to their mean depth. The rank correlation between subjects was 0.97 for man-made environments and 0.84 for natural scenes. The results show that different subjects provide almost identical orderings with pictures of man-made structures. Subjects performed significantly better with man-made than with natural scenes.
One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a new source of information for absolute depth estimation that does not rely on specific objects: we propose a computational procedure for absolute depth estimation based on the recognition of the scene structure. The model, using a holistic representation of the scene, provides similar results to human performances (correlations between rankings provided by the algorithm and by subjects are 0.88 for man-made environments and 0.81 for natural environments).
This PDF is available to Subscribers Only