Purchase this article with an account.
Constantin A. Rothkopf, Dana H. Ballard, Brian T. Sullivan, Kaya de Barbaro; Bayesian modeling of task dependent visual attention strategy in a virtual reality environment. Journal of Vision 2005;5(8):920. doi: 10.1167/5.8.920.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
The deployment of visual attention is commonly framed as being determined by the properties of the visual scene. Top-down factors have been acknowledged, but they have been described as modulating bottom-up saliency. Alternative models have proposed to understand visual attention in terms of the requirements of goal directed tasks. In such a setting, the underlying task structure is the focus of the observed fixation patterns.
Here we report results from experiments and a model designed to test the relative importance of these alternatives by quantifying the task dependence of subject's visual strategies. Human subjects walked along a walkway, avoided obstacles, and picked up litter in a virtual reality environment. The spatial distributions of objects as well as the combinations and priorities of the different tasks were varied across subjects. Additionally, a large number of very salient distracters were embedded in the visual scene on control trials. The eye and head movements of subjects were recorded using a head mounted eye-tracker integrated into the virtual reality display.
The sequential order of the image features at fixated locations was subsequently analyzed using a Bayesian formulation: the fixated features in the context of the visual scene are observable variables and the model learns the best parameters for hidden internal states, corresponding to features of the tasks the subject was involved in. The results from applying the model to the empirical data show that the best fit in terms of the posterior probabilities is obtained by incorporating an explicit level representing the context of the scene and the temporal sequence of fixated features. We show that the context-augmented model is able to capture the employed strategy better than a HMM with a comparable number of free parameters. Additionally we demonstrate that the trained models can be used to recognize which task a subject is carrying out, using only the fixated features and the scene context.
This PDF is available to Subscribers Only