Abstract
Recent studies have shown that, during natural viewing, humans tend to bring their gaze to the objects that are semantically similar either to the currently fixated one, or to the requested search target (Hwang et al., 2011). This result, however, may be confounded by knowledge of the scene gist, which does not require object recognition (Torralba et al., 2006), or by spatial dependency among objects (Oliva & Torralba, 2007). To clarify the contributions of these factors to semantic guidance, subjects were asked to view a series of displays with the scene gist removed. Each display was generated by segregating 15 objects from a natural scene in the LabelMe database. To remove the scene gist, all objects were pasted on a grey canvas. The objects were placed either at the same coordinates as in the original scene so that no scene gist but spatial dependency was provided (fixed condition), or at randomly selected locations on the canvas in order to also remove the spatial dependency (scrambled condition). The analysis was compared with a control condition using pseudo-randomly located "fixations." The result shows that, without scene gist, the preserved spatial dependency among objects (fixed condition) still provided additional semantic information and resulted in stronger semantic guidance (ROC = 0.68) than the scrambled condition (ROC = 0.6), in which both scene gist and spatial dependency were removed. Interestingly, in both conditions, semantic guidance was stronger than in the control condition (ROC = 0.5). Our results confirm the previous finding of semantic guidance and show that it is not entirely due to either the effect of scene gist, or to spatial dependency among objects. Even without scene gist or spatial dependency, subjects could still retrieve semantic information to guide their attention. This strategy may facilitate the efficiency of visual search and scene understanding.
Meeting abstract presented at VSS 2013