September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Long-range modulatory feedback connections in deep neural networks support top-down category-based attention
Author Affiliations & Notes
  • Talia Konkle
    Harvard University
  • George A. Alvarez
  • Footnotes
    Acknowledgements  CRCNS 8431439-01 to TK and NSF PAC COMP-COG 1946308 to GAA
Journal of Vision September 2024, Vol.24, 1307. doi:https://doi.org/10.1167/jov.24.10.1307
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Talia Konkle, George A. Alvarez; Long-range modulatory feedback connections in deep neural networks support top-down category-based attention. Journal of Vision 2024;24(10):1307. https://doi.org/10.1167/jov.24.10.1307.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Many views of the world are cluttered with multiple kinds of objects present, but at any given moment only a subset of this information may be task-relevant. Top-down attention can direct visual encoding based on internal goals, e.g. when looking for keys, attention mechanisms select and amplify the relevant key-like image statistics, aiding detection and modulating the gain across the visual hierarchy. Motivated by visual cognition and visual neuroscience findings, we designed long-range modulatory feedback pathways to outfit deep neural network models, with learnable channel-to-channel influences between source and destination layers that spatially broadcast feature-based gain signals. We trained a series of Alexnets with varying feedback pathways on 1000-way ImageNet classification to be accurate on both their feed-forward and modulated pass. First, we show that models equipped with these feedback pathways naturally show improved image recognition, adversarial robustness, and emergent brain-alignment, relative to baseline models. Critically, the final layer of these models can serve as a flexible communication interface between visual and cognitive systems, where cognitive-level goals (e.g. “key?”) can be specified as a vectors in the output space, and naturally leverage feedback projections to modulate earlier hierarchical processing stages. We compare and identify the effective ways to ‘cognitively steer’ the model based on prototype representations, which dramatically improve recognition of categories in composite images of multiple categories, succeeding where baseline feed-forward models fail. Further, these models recapitulate neural signatures of category-based attention—e.g. showing modulation of face and scene selective units inside the model when attending to either faces or scenes, when presented with a fixed face-scene composite image. Broadly, these models offer a mechanistic account of top-down category-based attention, demonstrating how long-range modulatory feedback pathways can allow different goal states to make flexible use of fixed visual circuity, supporting dynamic goal-based routing of incoming visual information.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×