September 2011
Volume 11, Issue 11
Vision Sciences Society Annual Meeting Abstract  |   September 2011
Recognizing objects, faces, and flowers using fixations
Author Affiliations
  • Christopher Kanan
    Department of Computer Science and Engineering, University of California San Diego, USA
  • Garrison Cottrell
    Department of Computer Science and Engineering, University of California San Diego, USA
Journal of Vision September 2011, Vol.11, 894. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Christopher Kanan, Garrison Cottrell; Recognizing objects, faces, and flowers using fixations. Journal of Vision 2011;11(11):894. doi:

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Most biologically-inspired and computer vision systems are purely feedforward: they immediately classify an object after observing it, even if the model's confidence is low. We have developed a biologically-inspired model called NIMBLE that actively classifies objects (Kanan & Cottrell, 2010). NIMBLE is based on two characteristics of the primate visual system: (1) sparse visual filtering and (2) sequential fixation-based visual attention. We learn sparse visual filters from natural image patches using Independent Component Analysis (ICA), which produces V1-like filters. The distributions of each ICA filter's responses are fitted by generalized Gaussian distributions. Using these statistics, we compute a saliency map based on rare feature values (Zhang et al., 2008). In training, the saliency map is randomly sampled to generate simulated fixations, and NIMBLE records the responses of spatially pooled V1 features from the fixation region in a class-conditional non-parametric density model. In recognition, fixations are acquired the same way, and used to update a posterior probability distribution over the categories. We extract and store 100 fixations from each image in the training set, and use 100 fixations on test images. Although NIMBLE only uses one feature type, it exceeds state-of-the-art computer vision models when the number of training instances is small, and performs comparably with methods using four or more feature types (e.g., multiple kernel learning). NIMBLE achieves 78.5% mean per-class accuracy on objects (Caltech-101) using 30 training images per class (chance is 1/101), 92.7% accuracy on faces (AR) using one training image per person (chance is 1/120), and 71.4% accuracy on 102 Flowers (chance is 1/102). We use the same parameter settings across datasets, demonstrating that NIMBLE, like primates, does not need to retune parameters for each problem.

This work was supported by the James S. McDonnell Foundation (Perceptual Expertise Network, I. Gauthier, PI), and the NSF (grant #SBE-0542013 to the Temporal Dynamics of Learning Center, G.W. Cottrell, PI and IGERT Grant #DGE-0333451 to G.W. Cottrell/V.R. de Sa.). 

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.