September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Eccentricity Dependent Neural Network with Recurrent Attention for Scale, Translation and Clutter Invariance
Author Affiliations & Notes
  • Jiaxuan Zhang
    Singapore University of Technology and Design
    Columbia University
  • Yena Han
    Massachusetts Institute of Technology
  • Tomaso Poggio
    Massachusetts Institute of Technology
  • Gemma Roig
    Singapore University of Technology and Design
    Massachusetts Institute of Technology
Journal of Vision September 2019, Vol.19, 209. doi:https://doi.org/10.1167/19.10.209
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Jiaxuan Zhang, Yena Han, Tomaso Poggio, Gemma Roig; Eccentricity Dependent Neural Network with Recurrent Attention for Scale, Translation and Clutter Invariance. Journal of Vision 2019;19(10):209. doi: https://doi.org/10.1167/19.10.209.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The human visual system perceives the environment by integrating multiple eye-fixations at different locations of the scene. For each eye-fixation, there is lower perceived resolution at the periphery and higher at the center of the visual field due to the receptive field size of the neurons of the retina and early visual cortex increasing with eccentricity from the fixation point to the periphery. The eccentricity dependence of the receptive field size has been argued to allow invariance to scale and background clutter in the vision system for object recognition, whereas the eye-fixation mechanism provides invariance to the object position. To further test this hypothesis, we propose a novel computational approach that integrates Eccentricity Dependent Neural Network (ENN) with Recurrent Attention Model (RAM). ENN, a recently introduced computational model of the visual cortex, processes the input at different scales, with receptive field sizes that change with eccentricity at multiple scale channels. This incorporates intrinsic scale invariance property into the model. RAM has an attention mechanism using Reinforcement Learning, which learns to fixate on different parts of the visual input at different time steps. When combined, RAM finds the best location to fixate on at each time step, then use the location as the center of the input in ENN. We conducted extensive experiments using MNIST dataset, where images of digits are trained and tested at different scales and positions to compare the proposed system, ENN-RAM, to the original RAM. Our experiment results reveal that with less training data used, ENN-RAM model is able to generalize to a different scale, i.e., it recognizes objects at scales different from the learned scales. We also observe that the new ENN-RAM is resistant to clutter when trained without such clutter, whereas vanilla RAM is not.

Acknowledgement: This work was funded by the MOE SUTD SRG grant (SRG ISTD 2017 131) and CBMM NSF STC award CCF-1231216. 
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×