Abstract
The weighted salience model is a prominent account of image-driven visual attention (Itti & Koch, 2000). This model operates by: sampling the visual environment, calculating feature maps, combining them in a weighted sum, and using this to refixate the eye. Despite the model's simplicity, Bayesian approaches have become popular as they provide a single framework for examining the role of bottom-up and top-down factors, such as spatial context (Torralba, 2003). We show that weighted salience models can be seen as implementing Bayesian target/non-target discrimination which enables faster learning of feature weightings at the expense of assuming target and distracters are linearly separable in feature space. In conjunction searches with linearly separable targets and distracters, differences between predicted and actual visual search performance can explained by an inbuilt independence assumption between features. This may be overcome with learning of the joint statistics of these features.
We examined all of these stages of the weighted salience model, starting by accounting for the non-uniform retinal sampling. We found that spatial aliasing causes feature coding unreliability, raising unanswered questions about the distribution of receptive field properties in each feature map. Using natural images, we found good discrimination between targets and non-targets in the weighted sum ‘salience’ map using signal detection theory methods. However when we analyse the fixation criterion toward ‘peak salience’ we find the strategy poor when distracter heterogeneity is high, a situation which might be expected in real world complex natural scenes.
In summary, the natural statistics of image features may need to be considered in order to explain variation in visual search performance. The ‘weighted salience’ model can be considered within the Bayesian approach which allows bottom-up and top-down factors to be addressed in a single framework.
Funded by EPSRC grant no GR/S47953/01(P)