August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Modeling the dynamics of spreading attention in objects: Do transformers behave like humans?
Author Affiliations
  • Hossein Adeli
    Stony Brook University
  • Seoyoung Ahn
    Stony Brook University
  • Nikolaus Kriegeskorte
    Columbia University
  • Gregory Zelinsky
    Stony Brook University
Journal of Vision August 2023, Vol.23, 5978. doi:https://doi.org/10.1167/jov.23.9.5978
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Hossein Adeli, Seoyoung Ahn, Nikolaus Kriegeskorte, Gregory Zelinsky; Modeling the dynamics of spreading attention in objects: Do transformers behave like humans?. Journal of Vision 2023;23(9):5978. https://doi.org/10.1167/jov.23.9.5978.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Transformers have recently achieved state-of-the-art performance in many domains, including in solving the problems of object detection and grouping in images. However, despite their success, transformers are controversial as models of the human brain. Importantly, these models have not been shown to capture the way humans group and perceive objects. Here we explore the potential for the attention mechanism in transformers to map onto the dynamics of human object-based attention and grouping. We probe the mechanisms of object-based attention using a two-dot paradigm, where two markers are placed on an image and the task is to indicate whether they are on the same or different objects. Previous related work found that human reaction time in this task varies with the difficulty of object grouping and the spread of attention within an object. Our model first processes images through a convolutional neural network and then through a transformer network to find the self-attention weights between different pieces of the image, each represented by a token. The model then “spreads” attention through self-attention weights by starting from the first marker location and accessing all the tokens having strong self-attention weights to the starting-marker token, thus supporting the unrestricted spread of attention in the selection of the other tokens. The token closest to the second marker is then selected from among all the strongly connected tokens, corresponding to the hypothesized active spreading of attention in the two-dot task. This process repeats until attention spreads to the second token. We show that the model predicts subjects’ reaction time as estimated by the number of steps taken in the image-dependent attention spread. Our work shows that the dynamically formed self-attention connections in transformers have a role similar to that of feedback and lateral connections in the spread of object-based attention in human vision.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×