Purchase this article with an account.
Yupei Chen, Gregory Zelinsky; Predicting Goal-directed Attention Control Using Inverse Reinforcement Learning and COCO-Search18. Journal of Vision 2020;20(11):1632. doi: https://doi.org/10.1167/jov.20.11.1632.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
A great deal is known about goal-directed attention in the context of simple features and hand-crafted stimuli, but far less is known about how attention is directed to cups or chairs or most other common objects of search. Recent work predicted the fixations made during the search for target-object categories using high-dimensional feature representations learned by deep neural networks, but these networks were trained for object classification, perhaps explaining their modest success in predicting attention control. We adopt the very different approach of learning these control features directly from observations of search behavior. Specifically, we use adversarial inverse-reinforcement learning (IRL), a state-of-the-art imitation-learning method from the machine-learning literature, to learn a reward function for mimicking the gaze fixations made during search. This was not previously possible because there was no search-fixation labeled dataset of images sufficiently large to train a deep network. Here we introduce COCO-Search18, a dataset of 10 participants searching for each of 18 target-object categories (cups, chairs, etc.) in 6202 images. With ~300,000 search fixations, COCO-Search18 is the largest dataset of search behavior currently available. After training and evaluation on COCO-Search18 (70% trainval, 30% test), and recovering the reward maps used for fixation prediction, we found that human and IRL searchers both fixated all 18 target categories well above chance, on average in < 3 saccades (human: 2.62, model: 2.05). We also found that searchers agreed in their assignment of reward to select non-targets/regions when searching for different target categories, meaning that search was guided by features having a spatial or semantic relationship to the target, in addition to target features. The different reward maps learned for different searchers will also be discussed. We show that using IRL and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.
This PDF is available to Subscribers Only