August 2009
Volume 9, Issue 8
Vision Sciences Society Annual Meeting Abstract  |   August 2009
Gist based top-down templates for gaze prediction
Author Affiliations
  • Zhicheng Li
    Department of Computer Science, University of Southern California, and School of Automation Science and Electrical Engineering, Beihang University
  • Laurent Itti
    Department of Computer Science, University of Southern California
Journal of Vision August 2009, Vol.9, 202. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Zhicheng Li, Laurent Itti; Gist based top-down templates for gaze prediction. Journal of Vision 2009;9(8):202.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

People use focal visual attention and rapid eye movements to analyze complex visual inputs in a manner that highly depends on current scene's properties. Here we propose a top-down attention model which exploits visual templates associated with different types of scenes. During training, an image set has been manually classified into several scene categories and for each category we define a corresponding top-down map which highlights locations likely to be of interest empirically. Then “gist” feature vectors of each category's images are computed to generate a Gaussian gist feature distribution, or signature of that category. During testing, the input image's gist feature vector is computed first, based on this feature vector and the already generated scene categories' gist feature distributions, a group of corresponding weights are computed using the probability density functions. The top-down map is then the weighted summation of those predefined templates. Finally, the top-down map is combined with a bottom-up saliency map (Itti & Koch 2001) to generate a final attention guidance map. In eye-tracking validation experiments, two video types are adopted as testing data, one is an original set of captured video clips and the other one is built by cutting the original clips into 1–3s small clips and re-assembling. Results show that in the original clips, the area under curve (AUC) score and the KL distance of the standard bottom-up saliency map is 0.665 and 0.185 (higher is better) while the attention guidance map result is 0.688 and 0.242, respectively; with the re-assembled clips, the standard bottom-up model result is 0.648 and 0.145 while the combined model result is 0.718 and 0.327. Our results suggest that the attention selection can be more accurate with the proposed top-down component. [1] Itti, L. and Koch, C. 2001 Computational Modeling of Visual Attention, Nature Reviews Neuroscience, 2(3), 194–203

Li, Z. Itti, L. (2009). Gist based top-down templates for gaze prediction [Abstract]. Journal of Vision, 9(8):202, 202a,, doi:10.1167/9.8.202. [CrossRef]
 The authors gratefully acknowledge the contribution of NSF, HFSP, NGA, DARPA and China Scholarship Council.

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.