Abstract
Visual search in naturalistic scenes is highly efficient. One crucial reason for this is attentional guidance by our knowledge of the regularities that govern those scenes—their “scene grammar”. Our study investigated the hierarchical organization of this scene grammar, focusing on “anchor objects”, which we hypothesize are essential building blocks of environmental scenes that predict the locations of other objects (e.g., the sink predicting the soap on top, the shower predicting the shampoo inside). In a virtual reality eye tracking study, we had 24 participants search for targets in navigable, immersive scene environments that were manipulated with respect to the presence of anchor objects as well as their syntactic composition. We found that concealing the semantic identity of anchors (by replacing them with grey cuboids of matching dimensions) slowed the process of locating targets in syntactically consistent scenes (i.e., objects placed in expected locations). This was apparent in the time to the first target fixation, the number of fixations, and scanpath length of search trials. Furthermore, our motion tracking data shows that subjects exhibited more extended movement in consistent scenes without intact anchors, suggesting a greater need for costly body movements in the absence of anchor context. In scenes with inconsistent syntax (i.e., objects’ arrangement violates expectations), where search was overall much slower, we found the opposite effect with respect to manipulating anchors: Replacing anchors with cuboids speeded search and decreased movement, indicating that anchors lost their guiding function and became useless clutter in such inconsistent scenes. This shows how semantic and syntactic anchor object information are vital components of a scene’s grammar which interact to efficiently guide both eye and body movements when we search for objects, bringing us a step closer to uncovering the hierarchical nature of scene priors and its role in efficient real-world search.