August 2014
Volume 14, Issue 10
Vision Sciences Society Annual Meeting Abstract  |   August 2014
Viewpoint invariant object recognition: Spatiotemporal information during unsupervised learning enhances generalization
Author Affiliations
  • Moqian Tian
    Psychology Dept., Stanford Univ., Stanford, CA
  • Kalanit Grill-Spector
    Psychology Dept., Stanford Univ., Stanford, CA
Journal of Vision August 2014, Vol.14, 1305. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Moqian Tian, Kalanit Grill-Spector; Viewpoint invariant object recognition: Spatiotemporal information during unsupervised learning enhances generalization . Journal of Vision 2014;14(10):1305.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

View invariant object recognition requires both binding multiple 2D views of an object and discriminating among different objects that are highly similar. However, it is unclear what information is learned during unsupervised learning to enable this ability. It has been hypothesized that spatiotemporal continuity between views during learning may be key for binding objects views to a single mental representation. We investigated this hypothesis across four experiments testing subjects' ability to discriminate among novel 3D objects across rotation, before and after training under two conditions: sequential: subjects were presented with 24 views of an object spanning 180° in sequential order providing spatiotemporal continuity, and random: subjects were presented with the same views, but in random order. Subjects showed significant improvement after training in discriminating views of 3D objects rotated in the image plane (Experiment 1, n=14,ΔAccuracy=27.6±1.5%, ) or in depth (Experiment 2, n=20, ΔAccuracy=21.3±2.2%). Surprisingly, we found no differences in performance across sequential and random learning. In Experiment 3, we tested if implied motion serves as a cue to bind views by comparing training as before to training with masks placed between consecutive images reducing the implied motion. We found significant learning effects across all conditions (n=20, ΔAccuracy=21.0±3.4%), but no difference between masked and unmasked presentations. Finally, in Experiment 4 we tested subjects ability to generalize their learning to new object views. Subjects were trained with seven views spanning 180° and tested on untrained views interpolated between the trained views. Results revealed that sequential learning improved generalization performance significantly more than random learning (ΔAccuracy sequential=18.5±2.6%, ΔAccuracy random=9.2±2.6%, n=26). Overall, our data shows that spatiotemporal information during unsupervised learning is not necessary for view invariant recognition, but can lead to better generalization when training with a small number of views.

Meeting abstract presented at VSS 2014


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.