Abstract
Visual search and recognition in humans employ a combination of bottom-up (data-driven) and top-down (goal-driven) processes. Although many bottom-up search and recognition models have been developed, the computational and neural basis of top-down biasing in such models has remained elusive. This paper develops a new model of attention guidance with dual emphasis: a single common Bayesian representational framework is used (1) for learning how to bias and guide search towards desired targets, and, (2) for recognizing targets when they are found. At its core, the model learns probability distributions of an object's visual appearance having a range of values along a number of low-level visual feature dimensions, then uses this learned knowledge both to locate and to recognize desired objects. The model is tested on three publicly available datasets, ALOI, COIL and SOIL47, containing photographs of 1,000, 100 and 47 objects taken under many viewpoints and illuminations (117,174 images in total). Model performance for recognition is compared to that of two state-of-the-art object recognition models (SIFT and HMAX). The proposed model performs significantly better and faster, reaching 89% classification rate (SIFT: 25%, HMAX: 76%) when utilizing 1/4 of the images for training and 3/4 for testing, while at the same time being 89 and 279 times faster than SIFT and HMAX, respectively. The proposed model can also be used for top-down guided search, finding a desired object in a 5x5 search array on average within 4 attempts (chance would be 12.5 attempts). Our results suggest that the simple Bayesian formalism developed here is capable of delivering robust machine vision performance.
This work was supported by HFSP, NSF, DARPA, and NGA
his work was supported by HFSP, NSF, DARPA, and NGA.