Abstract
The visual world is complex, yet visual information processing is effortless. During scene viewing semantically related objects are prioritized for attention (Hayes & Henderson, 2021). Previous work has defined semantic relations relevant for gaze guidance based on models from computational linguistics. Here we aim to extend previous findings by investigating relationships between objects derived from their visual scene contexts. Neuroimaging and behavioral data have shown that objects that tend to co-occur in scenes are closely represented in the aPPA while frequently co-occurring objects receive higher similarity judgements in a behavioral task (Bonner & Epstein, 2020; Magri, Elmoznino & Bonner, 2023). Here, we investigate measures of object-object relations derived from their visual co-occurrence statistics in scenes to predict eye-movement behavior. Eye-movement data was collected from 100 participants who each viewed 100 scenes performing a free-viewing task. Using object label embeddings from the object2vec model (Bonner & Epstein, 2020) we constructed map-level representations that encode similarity between objects based on their likelihood to appear within the same scene. We used generalized mixed effects models to estimate gaze behavior as a function of co-occurrence values. Our results suggest that objects that are more highly related to other objects within a scene as a function of their co-occurrence likelihood are more likely to be fixated. These findings underscore the role of statistical regularities, particularly in the form of co-occurrence statistics within visual contexts, in shaping efficient eye-movement behavior. Consequently, our study suggests that object co-occurrence forms an integral part of the semantic representations guiding eye movements, contributing significantly to our understanding of object representational dimensions in scene exploration.