Abstract
While there is evidence that both visual salience and previously stored scene knowledge influence scene viewing behavior, the relationship between them and viewing behavior is unclear. The present study investigated the relationship between stimulus-based saliency and knowledge-based scene meaning. Sixty-five participants performed a scene memorization task while their eye movements were recorded. Each participant viewed 40 real-world scenes for 12 seconds each. A duration-weighted fixation density map for each scene was computed across all 65 participants to summarize viewing behavior. A saliency map for each scene was computed using the Graph-Based Visual Saliency model (Harel, Koch, & Perona 2006). Finally, a meaning map was generated for each scene using a new method in which people rated how informative/recognizable each scene region was on a 6-point Likert scale. Specifically, each scene was sampled using overlapping circular patches at two spatial scales (300 3° patches or 108 7° patches). Each unique patch was then rated by participants on Amazon Mechanical Turk (N=165) that rated a random set of ~300 patches. The 3° and 7° rating maps were smoothed using interpolation, and then averaged together to produce a meaning map for each scene. The unique and shared variance between the fixation density map and the saliency and meaning maps for each scene were computed using multiple linear regression with salience and meaning as predictors. The squared partial correlation showed that on average meaning explained 50% (SD=11.9) of the variance in scene fixation density while salience explained 35% (SD=12.5). The squared semi-partial correlation indicated that on average meaning uniquely explained 19% (SD=10.6) of variance in fixation density while salience only uniquely accounted for 4% (SD=4.0). These results suggest that scene meaning is a better predictor of viewing behavior than salience, and stored scene-knowledge uniquely accounts for relevant scene regions not captured by salience alone.
Meeting abstract presented at VSS 2017