Abstract
The sources that guide attention in real-world scenes are manifold and interact in complex ways. We have been arguing for a while now that attention during scene viewing is mainly controlled by generic scene knowledge regarding the meaningful composition of objects that make up a scene (a.k.a. scene grammar). Contrary to arbitrary target objects placed in random arrays of distractors, objects in naturalistic scenes are placed in a very rule-governed manner. In this talk, I will highlight some recent studies from my lab in which we have tried to shed more light on the hierarchical nature of scene grammar. In particular, we have found that scenes can be decomposed into smaller, meaningful clusters of objects, which we have started to call "phrases". At the core of these phrases you will find so-called "anchor objects", which are often larger, stationary objects that anchor strong relational predictions about where other objects within the phrase are expected to be. Thus, within a "phrase" the spatial relations of objects are strongly defined. Manipulating the presence of anchor objects, we were able to show that both eye movements and body locomotion are strongly guided by these anchor objects when carrying out actions with naturalistic 3D settings. Overall, the data I will present will provide further evidence for the crucial role that anchor objects play in structuring the composition of scenes and thereby critically affecting visual search, object perception and the forming of memory representations in naturalistic environments.