Abstract
The role of semantic information in eye-movement control is increasingly recognized. One prototypical effect is particularly well studied: objects that are semantically inconsistent with their context (e.g., a shoe on a bathroom sink) attract more fixations than semantically consistent objects (e.g., a hair brush on the sink). The typical interpretation of this effect argues that fixations are driven towards inconsistent objects because they “contain greater meaning”. In the current study, we directly tested this explanation using contextualized meaning maps (cMMs), a method to quantify the spatial distribution of ‘meaning’ across an image. These maps aggregate crowd-sourced ratings of the meaningfulness of local images-patches into a distribution over an image. Importantly, patch-ratings are provided by raters who know the image, from which the patches originate. Therefore, when providing their ratings, raters can take into account the extent to which objects on the patches are consistent with the scene context. In our first experiment, we collected eye-tracking data and created cMMs for scenes, in which the consistency of objects with the scene was experimentally manipulated. As predicted, human observers fixated more on inconsistent vs. consistent objects. However, if anything, raters rated patches containing semantic inconsistencies as less meaningful, challenging the long-held notion that semantically inconsistent objects “contain greater meaning”. This finding was confirmed in Experiment 2, where 140 raters rated a carefully selected set of image-patches. Patches extracted from the same location within a scene were rated as less meaningful when the patch contained inconsistent, rather than consistent, objects. In summary, we demonstrated that, in contrast to a long-held view, semantically inconsistent objects might be experienced as less (not more) meaningful than their consistent counterparts, and that cMMs do not capture prototypical influences of image meaning on the guidance of human gaze.