Purchase this article with an account.
Moqian Tian, Kalanit Grill-Spector; Spatio-temporal information is not necessary for generating view-point invariant object recognition during unsupervised learning. Journal of Vision 2013;13(9):263. doi: https://doi.org/10.1167/13.9.263.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
To achieve view invariant object recognition it is necessary to bind different 2D views of an object to the same 3D representation. One mechanism that has been proposed is unsupervised learning in which subjects see smoothly changing views of an object, generating spatial and temporal correlations between views that are thought to bind them (Wallis et al. 2001). However, it is unknown if spatio-temporally ordered exposure to object views are necessary for generating view invariant recognition. To address this question we conducted three experiments and tested subjects' ability to learn to discriminate novel 3D computer generated objects across views. During unsupervised learning subjects were exposed to 24 views of novel objects (12 times each). Views of an object appeared either in sequential or random order. Pre- and post- training (~10 minutes apart) subjects performed a discrimination task, judging if two views 90° apart were of the same object or of different objects. In Experiment 1, 14 subjects learned to discriminate views of 3D objects rotated in the image plane. We found a strikingly fast and significant learning effect. Surprisingly, we found no differences in performance across sequential and random learning. In Experiment 2, 20 subjects learned to discriminate views of 3D objects rotated in depth around the vertical axis. Again, we found significant learning and no differences across sequential and random learning. In Experiment 3, we tested if implied motion was key to learning. 10 subjects were similarly trained but for half of the objects, masks were placed between consecutive images to prevent implied motion. We found significant learning effects across all conditions, no differences between masked and unmasked conditions, and a significant improvement for sequential than random training. Overall, our experiments show that unsupervised exposure to 2D views of objects is sufficient to generate view invariant recognition.
Meeting abstract presented at VSS 2013
This PDF is available to Subscribers Only