December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Using Object Reconstruction as a Dynamic Attention Window to Improve Recognition Robustness
Author Affiliations
  • Seoyoung Ahn
    Department of Psychology, Stony Brook University, NY 11790, USA
  • Hossein Adeli
    Department of Psychology, Stony Brook University, NY 11790, USA
  • Gregory Zelinsky
    Department of Psychology, Stony Brook University, NY 11790, USA
    Department of Computer Science, Stony Brook University, NY 11790, USA
Journal of Vision December 2022, Vol.22, 3692. doi:https://doi.org/10.1167/jov.22.14.3692
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Seoyoung Ahn, Hossein Adeli, Gregory Zelinsky; Using Object Reconstruction as a Dynamic Attention Window to Improve Recognition Robustness. Journal of Vision 2022;22(14):3692. https://doi.org/10.1167/jov.22.14.3692.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Humans are able to partially reconstruct visual information, as evidenced by our ability to imagine and dream, yet it is debated whether a reconstruction process is functionally used for online visual perception. We focus on visual object recognition and propose that reconstruction creates initial hypotheses about an object’s shape and location, and serves as the attentional window effectively restricting visual encoding to the image region depicting the features needed for object recognition. To test this hypothesis, we built an iterative encoder-decoder system where an object reconstruction is generated from the decoder and then fed back to the encoder to mask the image region to be processed in the next step. We tested the model’s recognition performance on the challenging digit recognition task, MNIST-C, where 15 different types of corruption are applied to handwritten digit images. Our model outperformed other models that are especially designed to deal with out-of-distribution generalization, e.g., adversarially trained models. Ablation studies also confirmed that having an object reconstruction mask during encoding significantly increases model robustness compared to when the model just learns to reconstruct an object without utilizing it as a mask. Analyzing performance across the image corruption types in MNIST-C revealed that the object reconstruction mask is especially helpful for shape-oriented recognition, rendering the system more resilient to texture perturbations, e.g., an image embedded with fog or pepper/salt noise. One vulnerability of our method is evidenced by the (infrequent) cases when the initial object reconstruction is incorrect, leading to a reconstruction of the wrong object and a predicted visual hallucination. We discuss this problem and propose methods using the mismatch between the visual input and a reconstruction as an error signal to obtain even more robust and veridical object representations.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×