Abstract
Humans actively shift their gaze when viewing dynamic real-world scenes. While there is a long-standing interest in understanding this behavior, the complexity of natural scenes makes it difficult to analyze experimentally. During free viewing, it has long been thought that the targets of eye movements are selected based on bottom-up saliency, but evidence accumulates that objects play an important role in the selection process. Here, we use a computational scanpath prediction framework to systematically compare predictions of models that incorporate combinations of object and saliency information, to human eye-tracking data. We model saccades as sequential decision processes between potential targets. To investigate the relevance of object-based selection, we compare an object-based model in which saccades target semantic objects, with a location-based model in which saccades target individual pixel values. Target selection in both models depends on potential targets’ eccentricity, the previous scanpath history, and target relevance. Target relevance is implemented either based on the distance to the center (center bias), on saliency based on low-level features, or high-level saliency as predicted by a deep neural network. We optimize each model’s parameters with evolutionary algorithms and fit them to reproduce the saccade amplitude and fixation duration distributions of free-viewing eye-tracking data on videos of the VidCom dataset. We assess model performance with respect to spatial and temporal fixation behavior, including the proportion of fixations exploring the background, as well as detecting, inspecting, and revisiting objects. Human data were best predicted by the object-based model with low-level saliency, followed by the location-based model with high-level saliency and the object-based model combined with a center bias. The location-based model with low-level saliency or center bias mainly explores the background. These results support the view that object-level attentional units play an important role in human exploration behavior, while saliency helps to prioritize between objects.