Abstract
We have recently shown that meaning is a better predictor of overt visual attention than salience during scene memorization and aesthetic judgment tasks (Henderson & Hayes, 2017). The present study investigated whether meaning- or salience-based guidance are suppressed while performing an arbitrary visual search task. Thirty-eight participants viewed 80 real-world scenes for 12 seconds each and searched for embedded letter L targets. Half of the scenes contained 1 or 2 randomly placed letter targets and the other 40 scenes contained no targets to avoid excessive target fixations. Only the fixation data from the 40 no-target scenes were analyzed. For each scene, a fixation density map was computed across all 38 participants to summarize scene viewing behavior. A saliency map for each scene was computed using the Graph-Based Visual Saliency model (Harel, Koch, & Perona 2006). Finally, a meaning map was generated for each scene using human ratings of how informative/recognizable isolated scene regions were on a 6-point Likert scale. Scene regions were sampled using overlapping circular patches at two different spatial scales. Each unique patch was then rated 3 times by participants on Amazon Mechanical Turk (N=165) that rated a random set of 300 patches. The spatial rating maps were smoothed using interpolation and averaged together to produce a meaning map for each scene. The squared linear correlation between each scene fixation density map and the corresponding meaning and salience maps were computed. Meaning explained 18% (SD=10.8) of the variance in scene fixation density while salience explained 13% (SD=9.1). This is almost a 3-fold reduction relative to the scene memorization task and aesthetic judgment tasks of Henderson and Hayes (2017). These results suggest that scene-orthogonal task demands are capable of suppressing both meaning- and salience-based guidance during scene viewing.
Meeting abstract presented at VSS 2018