September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Properties of invariant object recognition in human one-shot learning suggests a hierarchical architecture different from deep convolutional neural networks
Author Affiliations & Notes
  • Yena Han
    MIT Center for Brains, Minds and Machines
  • Gemma Roig
    MIT Center for Brains, Minds and Machines
    Singapore University of Technology and Design
  • Gad Geiger
    MIT Center for Brains, Minds and Machines
  • Tomaso A Poggio
    MIT Center for Brains, Minds and Machines
Journal of Vision September 2019, Vol.19, 28d. doi:https://doi.org/10.1167/19.10.28d
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yena Han, Gemma Roig, Gad Geiger, Tomaso A Poggio; Properties of invariant object recognition in human one-shot learning suggests a hierarchical architecture different from deep convolutional neural networks. Journal of Vision 2019;19(10):28d. https://doi.org/10.1167/19.10.28d.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Recognition invariant to transformations can be a significant advantage for a visual system. It is important, however, to distinguish between intrinsic invariance due to the underlying representation and example-based invariance for familiar objects that have been previously seen under different viewpoints. To characterize invariance in humans, we conducted psychophysical experiments measuring object invariant recognition performance in one-shot learning scheme. We report tolerance to scale and position changes by analyzing recognition accuracy of Korean letters presented in a flash to non-Korean subjects, who had no previous experience with Korean letters. We found that humans have significant scale-invariance at the center of the visual field after only a single exposure to a novel object. The degree of translation-invariance is limited, depending on the size and position of objects. We represent the range of invariance as the window of invariance, and compare it with the window of visibility, which is obtained by testing Korean subjects under the same experimental conditions as for the one-shot learning task. This comparison revealed that the window of invariance lies within the window of visibility. In addition, to understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. For the computational model, we tested Eccentricity-dependent Neural Network (ENN), which we hypothesized it exploits the intrinsic invariance properties observed in the human experiments. Our modeling results suggest that to explain invariant recognition by humans, artificial neural networks require to explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representation captured with neurons’ receptive field sizes that change with eccentricity as in ENN. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a different computational strategy than current convolutional deep learning architectures, which is more data efficient and strongly reliant on eye-movements.

Acknowledgement: The Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. Y. Han is a recipient of Samsung Scholarship. 
×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×