Abstract
Most biologically-inspired and computer vision systems are purely feedforward: they immediately classify an object after observing it, even if the model's confidence is low. We have developed a biologically-inspired model called NIMBLE that actively classifies objects (Kanan & Cottrell, 2010). NIMBLE is based on two characteristics of the primate visual system: (1) sparse visual filtering and (2) sequential fixation-based visual attention. We learn sparse visual filters from natural image patches using Independent Component Analysis (ICA), which produces V1-like filters. The distributions of each ICA filter's responses are fitted by generalized Gaussian distributions. Using these statistics, we compute a saliency map based on rare feature values (Zhang et al., 2008). In training, the saliency map is randomly sampled to generate simulated fixations, and NIMBLE records the responses of spatially pooled V1 features from the fixation region in a class-conditional non-parametric density model. In recognition, fixations are acquired the same way, and used to update a posterior probability distribution over the categories. We extract and store 100 fixations from each image in the training set, and use 100 fixations on test images. Although NIMBLE only uses one feature type, it exceeds state-of-the-art computer vision models when the number of training instances is small, and performs comparably with methods using four or more feature types (e.g., multiple kernel learning). NIMBLE achieves 78.5% mean per-class accuracy on objects (Caltech-101) using 30 training images per class (chance is 1/101), 92.7% accuracy on faces (AR) using one training image per person (chance is 1/120), and 71.4% accuracy on 102 Flowers (chance is 1/102). We use the same parameter settings across datasets, demonstrating that NIMBLE, like primates, does not need to retune parameters for each problem.
This work was supported by the James S. McDonnell Foundation (Perceptual Expertise Network, I. Gauthier, PI), and the NSF (grant #SBE-0542013 to the Temporal Dynamics of Learning Center, G.W. Cottrell, PI and IGERT Grant #DGE-0333451 to G.W. Cottrell/V.R. de Sa.).