Abstract
Recognition invariant to transformations can be a significant advantage for a visual system. It is important, however, to distinguish between intrinsic invariance due to the underlying representation and example-based invariance for familiar objects that have been previously seen under different viewpoints. To characterize invariance in humans, we conducted psychophysical experiments measuring object invariant recognition performance in one-shot learning scheme. We report tolerance to scale and position changes by analyzing recognition accuracy of Korean letters presented in a flash to non-Korean subjects, who had no previous experience with Korean letters. We found that humans have significant scale-invariance at the center of the visual field after only a single exposure to a novel object. The degree of translation-invariance is limited, depending on the size and position of objects. We represent the range of invariance as the window of invariance, and compare it with the window of visibility, which is obtained by testing Korean subjects under the same experimental conditions as for the one-shot learning task. This comparison revealed that the window of invariance lies within the window of visibility. In addition, to understand the underlying brain computation associated with the invariance properties, we compared experimental data with computational modeling results. For the computational model, we tested Eccentricity-dependent Neural Network (ENN), which we hypothesized it exploits the intrinsic invariance properties observed in the human experiments. Our modeling results suggest that to explain invariant recognition by humans, artificial neural networks require to explicitly incorporate built-in scale-invariance, by encoding different scale channels as well as eccentricity-dependent representation captured with neurons’ receptive field sizes that change with eccentricity as in ENN. Our psychophysical experiments and related simulations strongly suggest that the human visual system uses a different computational strategy than current convolutional deep learning architectures, which is more data efficient and strongly reliant on eye-movements.
Acknowledgement: The Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. Y. Han is a recipient of Samsung Scholarship.