Abstract
A widely accepted description of the primate visual cortex includes two information processing pathways: a ventral (or "what") pathway that computes properties of object identity such as shape and color, and a dorsal (or "where") pathway that computes properties of the location of stimuli on the retina. The current state-of-the-art work of visual cortex, including the well-known Neocognitron and HMAX models, exclusively concentrated on the modeling of ventral pathway to deliver the functionality for object identification. The fact that the two visual pathways are densely interconnected and interacted in the primate visual cortex was largely neglected. This work in contrast addresses how spatial ("where") and category ("what") representations could interconnect and interact via bidirectional processing and predict both location and identification of an object given a visual input.Our computational approach is based on the development of a scalable network model, which integrates a ventral pathway, dedicated to object identification, with a dorsal pathway, dedicated to object localization and segmentation. The network receive image stimuli and coupled "what" labels and "where" labels as three external signals, constructing a Y-shaped network as a hierarchy of (deep hidden) cortical layers. The sparse coding model at each level is l0-constrained and results in a highly efficient online learning that does not require iterative steps to reach a fixed point of sparse representation. The proposed sparse coding model can be further implemented in a divide-and-conquer manner to provide an effective solution to learn this deep network with bidirectional connections. A preliminary result has shown the network in small scale to deal with multiple objects slowing shifting in various natural backgrounds. The results have demonstrated the high accuracy of both attentional tuning and object recognition, as well as the interaction between them.
Meeting abstract presented at VSS 2012