Purchase this article with an account.
Do Hyong Koh, Akram Bayat, Anubhkaw Nand, Marc Pomplun, Shaohua Jia; Eye Movements as Indicators of Scene Grammar Inconsistencies. Journal of Vision 2018;18(10):1280. doi: https://doi.org/10.1167/18.10.1280.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Saliency models are able to predict the locations of gaze fixations in free-viewing tasks at an above-chance level, meaning that subjects tend to fixate on higher saliency objects sooner and more often. However, these models are based on low-level features, and thus their predictive power diminishes for certain viewing tasks and scenes that involve high-level cognition. High-level features such as scene semantics can cause discrepancies between the predicted saliency map and the recorded eye movements in the scene. Furthermore, it is known that attentional capture by high-level, semantic information tends to go along with increases in fixation duration and pupil size variance. An interesting question then is: To what extent can these behavioral variables predict the influence of high-level factors on the distribution of visual attention, as indicated by the difference between saliency and fixation maps? As a starting point, we considered the effect of objects which violate the scene grammar. Such objects are inconsistent with the embedding scene, such as a printer in a kitchen, or with the laws of physics, such as a plate floating in the air. Subjects viewed a total of 128 scenes, and their task was to indicate whether each scene contained an inconsistency, which was the case in 64 of them. We measured and analyzed the subjects' eye movements and changes in pupil size during this task. Our initial data analysis showed that subjects were indeed likely to fixate on the violating objects early in the task and with longer fixation duration. In terms of pupillary responses, subjects revealed high variance around the time when they recognized the scene grammar violation. A preliminary evaluation of these data indicated that these two variables might indeed be robust indicators of high-level scene features, at least in the present scene-grammar based task.
Meeting abstract presented at VSS 2018
This PDF is available to Subscribers Only