Abstract
Traditional artificial visual systems have assumed a homogeneous sampling of the visual field and have tended not to be able to cope with natural or complex viewing situations. However, the visual neurophysiology shows that visual sampling is highly non-uniform and that features extracted in early vision are combined in a salience map. The purpose of the salience map is to drive eye movements to relevant parts of the visual field, allowing high resolution analysis of the fixated region by the fovea.
We have built a robotic system with an inhomogeneous retina. The visual system of the robot extracts features at different scales in different visual locations. Features such a color, spatial scale, and orientation are combined in a salience map. The salience map is altered by the task set in two distinct ways. First, the contribution of each feature map to the salience map is modulated by which features are predictive of the target. Second, locations in which the target are more likely to occur have a higher weighting. Foveal inspection of candidate locations then allows a decision about whether it is the target which is present at that location. Importantly, because of the inhomogeneous retina, salience changes radically with each fixation and the system appears to be more robust to complex scenes because detailed analysis is only occurring at one location at a time.
We shall demonstrate an implementation of the system which executes saccades and searches for a target in its natural environment.
Funded by EPSRC grant no GR/S47953/01(P)