Purchase this article with an account.
Zhicheng Li, Laurent Itti; Gist based top-down templates for gaze prediction. Journal of Vision 2009;9(8):202. doi: https://doi.org/10.1167/9.8.202.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
People use focal visual attention and rapid eye movements to analyze complex visual inputs in a manner that highly depends on current scene's properties. Here we propose a top-down attention model which exploits visual templates associated with different types of scenes. During training, an image set has been manually classified into several scene categories and for each category we define a corresponding top-down map which highlights locations likely to be of interest empirically. Then “gist” feature vectors of each category's images are computed to generate a Gaussian gist feature distribution, or signature of that category. During testing, the input image's gist feature vector is computed first, based on this feature vector and the already generated scene categories' gist feature distributions, a group of corresponding weights are computed using the probability density functions. The top-down map is then the weighted summation of those predefined templates. Finally, the top-down map is combined with a bottom-up saliency map (Itti & Koch 2001) to generate a final attention guidance map. In eye-tracking validation experiments, two video types are adopted as testing data, one is an original set of captured video clips and the other one is built by cutting the original clips into 1–3s small clips and re-assembling. Results show that in the original clips, the area under curve (AUC) score and the KL distance of the standard bottom-up saliency map is 0.665 and 0.185 (higher is better) while the attention guidance map result is 0.688 and 0.242, respectively; with the re-assembled clips, the standard bottom-up model result is 0.648 and 0.145 while the combined model result is 0.718 and 0.327. Our results suggest that the attention selection can be more accurate with the proposed top-down component.  Itti, L. and Koch, C. 2001 Computational Modeling of Visual Attention, Nature Reviews Neuroscience, 2(3), 194–203
This PDF is available to Subscribers Only