December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Using the DeepGaze III model to decompose spatial and dynamic contributions to fixation placement over time
Author Affiliations & Notes
  • Matthias Kümmerer
    University of Tübingen
  • Thomas S.A. Wallis
    Institute of Psychology and Centre for Cognitive Science, Technical University of Darmstadt
  • Matthias Bethge
    University of Tübingen
  • Footnotes
    Acknowledgements  German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Germany’s Excellence Strategy – EXC 2064/1 – 390727645 and SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms.
Journal of Vision December 2022, Vol.22, 3964. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge; Using the DeepGaze III model to decompose spatial and dynamic contributions to fixation placement over time. Journal of Vision 2022;22(14):3964.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Humans view images in scanpaths of fixations, where they move their gaze over the image to explore interesting parts of the image. Which factors govern the principles of such scanpaths and how they change over time has been the subject of substantial research. The deep learning based DeepGaze III model currently sets the state-of-the-art in free-viewing scanpath prediction on natural images. It combines a spatial prediction module, which captures the influence of scene content on fixation placement, with a scanpath history module that captures the influence of earlier fixations and therefore the dynamics of the scanpath. Here, we conduct a series of ablation studies to train variants of DeepGaze III with no access to scene content, scanpath history or both and analyse how well fixations are predicted over the course of free-viewing scanpaths. We find that the overall predictability of fixations decays substantially over the course of scanpaths. Comparing the ablated models allows us to attribute this decay nearly completely to a decay in the influence of scene content: early fixations focus on the most salient image areas. Additionally, there is an influence of the initial fixation, which is mostly decayed after one saccade. Interestingly, the effect of scanpath history stays nearly constant over the scanpaths, suggesting that the dynamics of scene viewing don't change substantially over time. Due to the capacity of our models relative to the size of the dataset, we can assume that the models are able to capture all substantive ways in which fixations could depend on image content and previous fixations (in the context of free viewing). This avoids the possible confounder that the models are simply not good enough to predict fixations well. This study provides an example of how deep learning based models can be used to further our understanding of human behaviour.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.