Abstract
Visual changes in the environment are potentially behaviorally relevant. Even if the real world appears static at a given moment, humans might thus explore it differently compared to static images, which are commonly used in studies on visual attention. We hypothesized that the possibility of changes to a scene (“potential for action”) causes observers to pay more attention to things that are expected to change and, consequently, more coherent viewing behavior across observers. To test this hypothesis, we recorded 80 videos of scenes that were relatively static initially before showing a critical change (e.g., a person starts walking; a message pops up on a phone). We first obtained ratings from participants about where the potential for action is highest based on the initial image of each scene. We then presented these scenes to an independent group of participants for free-viewing as either (1) static images for 10 seconds or (2) unfreezing videos, which—after 5 seconds that were visually identical to the corresponding static condition—unfroze and played as a dynamic scene for the remaining 5 seconds. Blocking these conditions ensured that participants knew if they could expect motion after 5 seconds. We found a stronger coherence between participants’ gaze behavior (measured by Normalized Scanpath Saliency) in the unfreezing-video compared to the static-image condition, starting seconds before unfreezing. Furthermore, participants moved their eyes significantly more often toward parts of the scene with high potential-for-action ratings. For both measures, we observed a systematic increase in the difference between the static-image and frozen-video conditions over the freeze duration of the video, peaking right before the potential movement onset at 5 seconds. Our results reveal experience-based expectations that observers have, reinforcing that we can learn most about attention in a dynamic world by studying it in dynamic scenes.