Abstract
Stored semantic knowledge gained through experience is theorized to play a critical role in determining the attentional priority of objects in real-world scenes. However, the link between semantic knowledge and attention is largely unknown due to the difficulty of quantifying semantics. The present study tested the link between stored semantic knowledge and scene attention by combining vector-space models of word semantics derived from how we use words in written text and crowd-sourced knowledge about the world with eye movements in real-world scenes. Within this approach, the vector-space model of word semantics (i.e., ConceptNet Numberbatch; Speer, Chin, & Havasi, 2016) served as a proxy for stored semantic knowledge gained from experience, and eye movements served as an index of attentional priority in scenes. Participants (N=100) viewed 100 real-world scenes for 12 seconds each while performing memorization and aesthetic judgment tasks. A representation of the spatial distribution of object semantics in each scene was built by segmenting and labeling all objects, computing the mean cosine similarity between each object and the other objects in that scene using ConceptNet, and then adding the mean object similarity values for the locations that objects occupied within the scene. We then applied a logistic general linear mixed effects model to examine how a scene region’s semantic value was related to its likelihood of being fixated with subject and scene as random effects. The results showed that the higher the semantic value of a scene region, the more likely that region was to be fixated. These findings help bridge the gap between the theorized role of stored semantic knowledge and attentional control during scene viewing and also highlight the usefulness of models of word semantics to test theories of scene attention.