September 2018
Volume 18, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2018
Emergence of visuospatial attention in a brain-inspired deep neural network
Author Affiliations
  • Gregory Zelinsky
    Department of Psychology, Stony Brook UniversityDepartment of Computer Science, Stony Brook University
  • Hossein Adeli
    Department of Psychology, Stony Brook University
Journal of Vision September 2018, Vol.18, 898. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Gregory Zelinsky, Hossein Adeli; Emergence of visuospatial attention in a brain-inspired deep neural network. Journal of Vision 2018;18(10):898. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

We show that a brain-inspired deep neural network (DNN) learns to shift its attention in order to accurately and efficiently detect objects in a search task. This model of the ATTention Network (ATTNet) consists of three interacting components: (1) early parallel visual processing (V1-V4) modeled by the convolutional layers of a CNN trained for object classification, (2) ventral processing consisting of max-pooling spatially-localized V4 feature maps to IT units, and (3) dorsal processing modelled by a Recurrent Neural Network that learns to bias features in the visual input reflecting target-category goals. ATTNet was trained using reinforcement learning on 4000 images (Microsoft COCO) of clocks, microwave ovens, or computer mice. As ATTNet greedily tried to maximize its reward it learned an efficient strategy for sampling information from an image, to restrict its visual inputs to regions offering the most evidence for being the target. This selective routing behavior was quantified as a "priority map" and used to predict the gaze fixations made by 30 subjects searching 240 images (also from COCO, but disjoint from the training set) for a target from one of the three trained categories. Target-absent trials (50%) consisted of COCO images having the same scene type ("kitchen") but not depicting the target category ("microwave"). As predicted, both subjects and ATTNet showed evidence for attention being preferentially directed to target goals, behaviorally measured as oculomotor guidance to the targets. Several other well-established findings in the search literature were observed, such as more attention shifts in target-absent trials compared to target-present. By learning to shift its attention to target-like image patterns, ATTNet becomes the first behaviorally-validated DNN model of goal-directed attention. More fundamentally, ATTNet learned to spatially route its visual inputs so as to maximize target detection success and reward, and in so doing learned to shift its "attention".

Meeting abstract presented at VSS 2018


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.