Based on neurophysiological and psychophysical findings, Koch and Ullman (
1985) proposed the first version of a biologically plausible model for bottom-up overt attention: the saliency map. This model has undergone several different implementations; however, its basic scheme remains unchanged (for a review, see Itti & Koch,
2001). The stimulus is analyzed in various feature channels like luminance, color, orientation, or motion. Color processing is implemented in two channels, which mimic color-opponent pathways in trichromatic primates. In each feature channel, local differences are computed, combined across several spatial scales and normalized in a nonlinear way. These “conspicuity maps” (Itti, Koch, & Niebur,
1998) are then summed up to yield the saliency map. Locations of high activity in the map are assumed to be salient, i.e., highly likely to be attended. The success of the model can be determined by examining its performance in predicting fixations of human observers. In the case of still images, namely grayscale outdoor scenes (Peters, Iyer, Itti, & Koch,
2005) and colored fractals, home interiors, landscapes, and outdoor scenes (Parkhurst, Law, & Niebur,
2002), neurobiologically plausible models were able to predict fixations to a certain extent. The ability of such models to discriminate between fixated and control image regions has also been found to be higher than chance (Kienzle, Wichmann, Schölkopf, & Franz,
2007). The saliency map approach has also been applied to movie clips and was found to predict fixation targets well above chance (Carmi & Itti,
2006; Le Meur, Le Callet, & Barba,
2007). These results suggest that neurobiologically inspired models can discriminate between fixated and non-fixated regions.