Abstract
Objects in real-world scenes follow a set of rules, "scene grammar", that allow us to interact with our environment with ease. However, the particulars behind this set of rules are not fully understood. We propose that scene grammar is hierarchically structured. As a testing hypothesis we will primarily differentiate between three unique levels: Scenes, anchor objects, and objects. Anchor objects are distinguishable from other objects in that they are generally large, salient, and diagnostic of a scene. More importantly, these objects serve as anchors for spatial predictions regarding many of the other objects in the scene (e.g. the pot is found on the stove, the soap is found in the shower) allowing for efficient object perception and search. Understanding the structure of scene grammar requires understanding of how these levels interact and work together. In an EEG experiment we asked participants to view two images presented one after another and determine if they were semantically congruent, or incongruent. Importantly, the first image, i.e. the prime, was either a scene (e.g. a kitchen) or a non-anchor object (e.g. a pan). The second image was always an anchor object either consistent (a stove) or inconsistent (a shower) with the prime. The N400 ERP component — originally known from language processing — signals semantic integration costs. Therefore, if the prime activates semantic predictions regarding the anchor we would expect violations of such semantic expectations to result in an increased N400 in response to the onset of the anchor. Moreover, the stronger the predictions activated by the prime, the greater the N400. Interestingly, we found a larger N400 for the anchor when observers were initially primed with non-anchor objects compared to scenes. This indicates that objects generate stronger predictions of anchors compared to the scenes which anchors are contained in.
Meeting abstract presented at VSS 2016