December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
A brain-inspired object-based attention network for multi-object recognition and visual reasoning
Author Affiliations
  • Hossein Adeli
    Department of Psychology, Stony Brook University
  • Seoyoung Ahn
    Department of Psychology, Stony Brook University
  • Gregory Zelinsky
    Department of Psychology, Stony Brook University
    Department of Computer Science, Stony Brook University
Journal of Vision December 2022, Vol.22, 4294. doi:https://doi.org/10.1167/jov.22.14.4294
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hossein Adeli, Seoyoung Ahn, Gregory Zelinsky; A brain-inspired object-based attention network for multi-object recognition and visual reasoning. Journal of Vision 2022;22(14):4294. https://doi.org/10.1167/jov.22.14.4294.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

To achieve behavioral goals, the visual system recognizes and processes the objects in a scene using a sequence of selective glimpses, but how is this attention control learned? Here we present an encoder-decoder model that is inspired by the interacting visual pathways making up the recognition-attention system in the brain. The encoder can be mapped onto the ventral ‘what’ processing, which uses a hierarchy of modules and employs feedforward, recurrent, and capsule layers to obtain an object-centric hidden representation for classification. The object-centric capsule representation feeds to the dorsal ‘where’ pathway, where the evolving recurrent representation provides top-down attentional modulation to plan subsequent glimpses (analogous to fixations) to route different parts of the visual input for processing (with the encoding and decoding steps taken iteratively). We evaluate our model on multi-object recognition (highly overlapping digits, digits among distracting clutter) and visual reasoning tasks. Our model achieved 95% accuracy on classifying highly overlapping digits (80 percent overlap between bounding boxes) and significantly outperforms the Capsule Network model (<90%) trained on the same dataset while having a third of the number of parameters. Ablation studies show how recurrent, feedforward and glimpse mechanisms contribute to the model performance in this task. In a same-different task (from the Synthetic Visual Reasoning Tasks benchmark), our model achieved near-perfect accuracy (>99%), similar to ResNet and DenseNet models (outperforming ALexNet, VGG and CORnets) on comparing two randomly generated objects. On a challenging generalization task where the model is tested on stimuli that are different from the training set, our model achieved 82% accuracy outperforming bigger ResNet models (71%), demonstrating the benefit of a contextualized recurrent computation paired with an object-centric attention mechanism glimpsing the objects. Our work takes a step towards more biologically plausible architectures by integrating recurrent object-centric representation with the planning of attentional glimpses.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×