Abstract
Any single object can produce a wide variety of images on the retinae, depending on its location in the world: the viewpoint, distance, and illumination dramatically alter the shape, size, and color of the retinal image. Yet, we are able to recognize objects within cluttered visual environments at a single glance. What might underlie this ability, is that the appearance of an object varies with its location in a predictable manner. The size of the objects’ retinal image, for instance, decreases linearly with its distance to the observer. Through a series of online behavioral experiments (total N=240), we ask whether the visual system capitalizes on such predictable relationships to aid object recognition. Participants were briefly presented with pixelated objects (14 categories, 6 exemplars each) presented in the far or near plane of outdoor scene photographs, and had to indicate whether the object was animate or inanimate. Critically, we manipulated the size-congruency of the objects by swapping the position of the ‘far’ and ‘near’ (i.e., small and large) objects within the scenes. Virtually all participants were both faster and more accurate at classifying the animacy of size-congruent objects (e.g., a large image of an object in the near plane) compared to size-incongruent objects (e.g., a large image of an object in the far plane). In subsequent control experiments, objects were presented (1) without the scene, preserving the size and spatial positioning of the objects; or (2) with a partial cut-out of the scene, preserving local contrasts between object and scene-background while hampering the extraction of depth. The behavioral advantage for size-congruent over size-incongruent objects was clearly larger when objects were presented within scenes, compared these two control conditions. These findings demonstrate that real-world object size – as inferred from pictorial depth cues in the scene context – contributes to object recognition.