September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Three theories of object-based attention implemented in deep neural network models
Author Affiliations
  • Hossein Adeli
    Columbia University
  • Seoyoung Ahn
    Stony Brook University
  • Gregory Zelinsky
    Stony Brook University
  • Nikolaus Kriegeskorte
    Columbia University
Journal of Vision September 2024, Vol.24, 227. doi:https://doi.org/10.1167/jov.24.10.227
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hossein Adeli, Seoyoung Ahn, Gregory Zelinsky, Nikolaus Kriegeskorte; Three theories of object-based attention implemented in deep neural network models. Journal of Vision 2024;24(10):227. https://doi.org/10.1167/jov.24.10.227.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Understanding the computational mechanisms that transform visual features into coherent object percepts requires the implementation of theories in scalable models. Here we report on implementations, using recent deep neural networks, of three previously proposed theories in which the binding of features is achieved (1) through convergence in a hierarchy of representations resulting in object-files, (2) through a reconstruction or a generative process that can target different features of an object, or (3) through the elevation of activation by spreading attention within an object via association fields. First, we present a model of object-based attention that relies on capsule networks to integrate features of different objects in the scene. With this grouping mechanism the model is able to learn to sequentially attend to objects to perform multi-object recognition and visual reasoning. The second modeling study shows how top-down reconstructions of object-centric representations in a sequential autoencoder can target different parts of the object in order to have a more robust and human-like object recognition system. The last study demonstrates how object perception and attention could be mediated by flexible object-based association fields at multiple levels of the visual processing hierarchy. Transformers provide a key relational and associative computation that may be present also in the primate brain, albeit implemented by a different mechanism. We observed that representations in transformer-based vision models can predict the reaction time behavior of people on an object grouping task. We also show that the feature maps can model the spreading of attention in an object.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×