August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Reconstruction-guided attention improves the object recognition robustness of neural networks
Author Affiliations
  • Seoyoung Ahn
    Stony Brook University
  • Hossein Adeli
    Stony Brook University
  • Gregory Zelinsky
    Stony Brook University
Journal of Vision August 2023, Vol.23, 5129. doi:https://doi.org/10.1167/jov.23.9.5129
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Seoyoung Ahn, Hossein Adeli, Gregory Zelinsky; Reconstruction-guided attention improves the object recognition robustness of neural networks. Journal of Vision 2023;23(9):5129. https://doi.org/10.1167/jov.23.9.5129.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and uses it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution object recognition dataset, MNIST-C (handwritten digits under corruptions) and IMAGENET-C (real-world objects under corruptions). Our model showed strong generalization performance against various image corruptions and significantly outperformed other feedforward convolutional neural network models (e.g., ResNet) on both datasets. Our model’s robustness was particularly pronounced under high levels of distortions, where it showed a maximum 20% accuracy improvement from the baseline model in the maximally noisy conditions in IMAGENET-C. Ablation studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model’s inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. Finally, the proposed model also yields high behavioral correspondence with humans, which are evaluated by the correlation between human and model’s response time (Spearman’s r=0.36, p<.001) and the types of error made. By infusing an AI model with a powerful attention mechanism, we show how reconstruction-based feedback can be used to explore the role of generation in human visual perception.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×