Purchase this article with an account.
Gaston Bujia, Melanie Sclar, Sebastian Vita, Guillermo Solovey, Juan KamienkowskI; Bayesian model of human visual search in natural images. Journal of Vision 2020;20(11):1596. doi: https://doi.org/10.1167/jov.20.11.1596.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
The ability to efficiently find objects in the visual field is essential for almost any everyday visual activities. In the last decades, there was a large development of models that accurately predict the most likely fixation locations (saliency maps), although it seems to have reached a plateau. Today, one of the biggest challenges in the field is to go beyond saliency maps to predict a sequence of fixations related to multiple visual tasks. Particularly, in visual search task, Bayesian observers have been proposed to model the visual search behavior as an active sampling process. In this process, during each fixation, humans incorporate new information and update the probability of finding a target at every location.
Here, we combine these approaches for visual search in natural images and proposed a model to predict the whole scanpath. Our Bayesian Searcher (BS) uses a saliency map as prior and computes the most likely next location given all the previous fixations, considering visual properties of the target and the scene. We collected eye-movement visual search data (N=57) in 134 natural indoor scenes and compare different variants of the model and its parameters.
First, considering only the third fixation of each scanpath, we compared different state-of-the-art saliency maps on our dataset, reaching similar AUC performances as in other datasets. But, after the third fixation, all performances diminished to almost chance level. This suggests that saliency maps alone are not enough when top-down task’s information is critical. Second and more strikingly, when comparing BS models, their behavior was indistinguishable from humans for all fixations, both in the percentage of target found as a function of the fixation rank and the scanpath similarity, reproducing the entire sequence of eye movements.
This PDF is available to Subscribers Only