Abstract
Visual attention is a ubiquitous mechanism found in sensory perception, especially in humans and other primates. While several computational models of attention are derived from the bottom-up sensory processing to extract saliency maps, shifts of attention are also thought to be generated from the top-down, through feature dependent weighting of the various feature dictionaries. In this abstract, we proposed an object segmentation framework by integrating bottom-up saliency maps with top-down attention, triggered by a sparse, hierarchical model of visual cortex called PANN (Petascale Artificial Neural Network). Similar to the previous HMAX/Neocognitron model, PANN is composed of coupled simple cell and complex cell layers. While simple cells build representations of features over successively larger receptive fields, complex cell layers associate the outputs of simple cells within the same layer to create representations of features that are increasingly viewpoint invariant. These representations are learned through a combination of unsupervised learning - which allows the system to acquire statistically features in its realm of experience - and supervised learning, which associates labels to specific clusters of features at the top layer. Through the hierarchical network, the spatial support of a classifier's decision can be traced down to input to create an informative map of which low-level image features were associated with the positive object foreground. As a result, this object relevance map becomes a useful measure of top-down attention. This object-based attention mechanism is applied to detect specific objects in an aerial video from a low-flying aircraft, and further to segment the whole objects within the same video by fusing bottom-up saliency maps. In that integrated manner, the bottom-up saliency maps can assist top-down attention to better segment the intact objects and top-down attention can assist saliency map to filter out unnecessary backgrounds.
Meeting abstract presented at VSS 2012