December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
DeepGaze vs SceneWalk: what can DNNs and biological scan path models teach each other?
Author Affiliations & Notes
  • Lisa Schwetlick
    University of Potsdam
  • Matthias Kümmerer
    University of Tübingen
  • Ralf Engbert
    University of Potsdam
  • Matthias Bethge
    University of Tübingen
  • Footnotes
    Acknowledgements  German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A & Deutsche Forschungsgemeinschaft (DFG):Collaborative Research Center (SFB) 1294, projects B03 and B05 (project no.~318763901)
Journal of Vision December 2022, Vol.22, 3986. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Lisa Schwetlick, Matthias Kümmerer, Ralf Engbert, Matthias Bethge; DeepGaze vs SceneWalk: what can DNNs and biological scan path models teach each other?. Journal of Vision 2022;22(14):3986.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Eye movements on natural scenes are driven by image content as well as by saccade dynamics and sequential dependencies. Recent research has seen a variety of models that aim to predict time-ordered fixation sequences, including statistical-, mechanistic-, and deep neural network (DNN) models, each with their own advantages and shortcomings. Here we show how a synthesis of different modeling frameworks may offer fresh insights into the underlying processes. Firstly, the explanatory power of biologically inspired models can help develop an understanding of mechanisms learned by DNNs. Secondly, DNN performance can be used to estimate data predictability and thereby help uncover new mechanisms. DeepGaze3 (DG3) is currently the best-performing DNN model for scan path predictions (Kümmerer & Bethge, 2020); SceneWalk (SW) is the best-performing biologically inspired dynamical model (Schwetlick et al., 2021). Both models can be fitted using maximum likelihood estimation and compute per-fixation likelihood predictions. Thus, we can analyze prediction divergence at the level of individual fixations. DG3 generally outperforms SW, indicating that the DNN is accounting for variance by learning mechanisms that are not yet included in the mechanistic SW model. Preliminary results show that SW tends to underestimate the probability of long, explorative saccades. In SW this behavior could be achieved by replacing the Gaussian attention span with a function with heavier tails or by implementing temporal attention span fluctuation. Furthermore, DG3 appears to compress previously unexplored areas, increasing likelihood for saccades to the region center. Once the region is fixated, DG3 broadens the local probability, consistent with a dualistic exploration-exploitation strategy. Adding corresponding mechanisms to SW may improve model performance and help develop more advanced dynamical models. Finding the synergies between different modeling approaches, specifically high-performing DNNs and more transparent dynamical models, is a valuable tool for improving our understanding of fixation selection during scene viewing.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.