Abstract
We show that a brain-inspired deep neural network (DNN) learns to shift its attention in order to accurately and efficiently detect objects in a search task. This model of the ATTention Network (ATTNet) consists of three interacting components: (1) early parallel visual processing (V1-V4) modeled by the convolutional layers of a CNN trained for object classification, (2) ventral processing consisting of max-pooling spatially-localized V4 feature maps to IT units, and (3) dorsal processing modelled by a Recurrent Neural Network that learns to bias features in the visual input reflecting target-category goals. ATTNet was trained using reinforcement learning on 4000 images (Microsoft COCO) of clocks, microwave ovens, or computer mice. As ATTNet greedily tried to maximize its reward it learned an efficient strategy for sampling information from an image, to restrict its visual inputs to regions offering the most evidence for being the target. This selective routing behavior was quantified as a "priority map" and used to predict the gaze fixations made by 30 subjects searching 240 images (also from COCO, but disjoint from the training set) for a target from one of the three trained categories. Target-absent trials (50%) consisted of COCO images having the same scene type ("kitchen") but not depicting the target category ("microwave"). As predicted, both subjects and ATTNet showed evidence for attention being preferentially directed to target goals, behaviorally measured as oculomotor guidance to the targets. Several other well-established findings in the search literature were observed, such as more attention shifts in target-absent trials compared to target-present. By learning to shift its attention to target-like image patterns, ATTNet becomes the first behaviorally-validated DNN model of goal-directed attention. More fundamentally, ATTNet learned to spatially route its visual inputs so as to maximize target detection success and reward, and in so doing learned to shift its "attention".
Meeting abstract presented at VSS 2018