Abstract
General scene knowledge (our "scene grammar") plays an important role in both identifying and locating objects in the real world. This knowledge reflects co-occurrences of scene elements and their structural regularities. Some objects appear more frequently within a specific context, e.g. a toothbrush in a bathroom rather than in a bedroom. When trying to locate an object, however, predicting the spatial relationship between various objects within a single scene is key for efficient search performance. We propose that the arrangement of objects is not only rule-governed, but hierarchical in its structure. In particular, we believe that some objects within each scene category function as anchors, carrying strong spatial predictions regarding other objects within the scene (e.g. the stove anchors the position of the pot). Therefore, these "anchors" constitute key elements in the hierarchy of objects in scenes. To test this hypothesis and to quantify the spatial relationship between objects in different scene categories, we extracted the spatial locations of objects from an image database. Inspired by graph theory, we captured the relationship of objects as a set of nodes connected by edges of varying weights. As a first approximation, our weights were set by 1) the general frequency of an object to object pairing, 2) the mean distance between these objects across many instances of a scene category, as well as 3) the standard deviation in the horizontal relationship between the objects (above/below). Stronger weights indicate a stronger spatial relationship between two objects. Based on these weights combined with cluster analyses, we identified "anchor" objects. We tested the behavioral relevance of the weight parameters by correlating them with search performance. Results show that reaction time decreases as weights increase. We take this as evidence that anchors predict single trial search performance for other objects in naturalistic scenes.
Meeting abstract presented at VSS 2018