Purchase this article with an account.
Jacopo Turini, Melissa L.-H. Vo; Effects of spatial layout and object content on visual scene recognition. Journal of Vision 2020;20(11):1070. doi: https://doi.org/10.1167/jov.20.11.1070.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Recognizing the environment around us is a fundamental step that our cognitive system performs to efficiently accomplish tasks like object recognition, search and navigation. Despite being complex stimuli, scenes are recognized quickly and accurately. There is evidence showing that this efficiency relies mainly on a global representation which preserves the spatial layout of a scene, discarding information about the object content. However, some objects within a scene have a strong importance in defining the semantics and layout of that context. These objects, called “anchors”, are generally bigger, not very moveable, and hold prediction about the presence and location of other objects. In this study, we compared recognition performance for scenes varying in the availability of spatial layout and object content, e.g. 1) multiple isolated anchors left in their natural arrangement (preserving spatial information while adding content information), 2) spatially rearranged anchors (natural spatial layout is disrupted, but content information remains present), 3) global textures of these scenes (preserved spatial layout, but no meaningful content). In a behavioral experiment, we briefly presented participants with images of scenes containing different levels of spatial and content information and asked them to categorize the stimuli. The scene was followed by the image of an object, either semantically consistent or inconsistent with the scene. Results show that scene recognition performance was not substantially diminished compared to full scenes when only showing isolated anchors, however rearranging them led to drops in performance. The mere layout of the scene without object content further decreased performance to chance level. The semantic consistency of objects following scene presentation affected all scene conditions but exhibited greatest effects when no content was present in the scene. We conclude that scene recognition relies on both content and spatial information and highlight the role of anchors in bringing together these two fundamental dimensions.
This PDF is available to Subscribers Only