Purchase this article with an account.
Cameron Ellis, Patrick Harding, Judith Fan, Nicholas Turk-Browne; How temporal context predicts eye gaze for dynamic stimuli . Journal of Vision 2016;16(12):328. https://doi.org/10.1167/16.12.328.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
What determines where we look during natural visual experience? Computer vision algorithms successfully account for gaze behavior in static scenes, but may be insufficient for modeling visual selection in dynamic scenes. This will likely require consideration of how image features evolve over time. For instance, where we attend might be constrained by where we have attended previously, by how long an object has been visible, and by expectations about when and where something will appear. To start bridging this gap, we adapted an algorithm originally developed to extract regions of high visual interest from static images (Harding & Robertson, 2013, Cogn Comput) to additionally capture how temporal information guides visual selection in dynamic scenes. Eye gaze was monitored in a group of 24 observers while they watched a series of short video clips depicting real-world indoor and outdoor scenes (Li et al., 2011, IMAVS). To ensure engagement, observers were occasionally prompted to describe what happened in the previous clip. Static visual-interest maps, generated by applying the original algorithm to each video frame independently, reliably predicted the distribution of eye fixations. However, we were able to influence model performance by adjusting two new parameters that incorporated information about temporal context when creating visual-interest maps: 'history', which placed additional weight on regions with high visual interest in preceding frames; and 'prediction', which placed additional weight on regions with high visual interest in subsequent frames. Further analyses examine whether the history and prediction of actual eye fixations provide additional explanatory power, how long of a temporal window in the past and future is most informative, and how different time samples should be weighted. The ultimate goal is to develop a refined model that better accounts for natural visual behavior and to understand how temporal context biases the allocation of attention.
Meeting abstract presented at VSS 2016
This PDF is available to Subscribers Only