September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Generalizing fixation predictions within and across datasets: towards a universal model of free-viewing fixations
Author Affiliations & Notes
  • Matthias Kümmerer
    University of Tübingen, Tübingen AI Center
  • Harneet Singh Khanuja
    University of Tübingen, Tübingen AI Center
  • Matthias Bethge
    University of Tübingen, Tübingen AI Center
  • Footnotes
    Acknowledgements  This work was supported by the Deutsche Forschungsgemeinschaft (DFG): Germany's Excellence Strategy - EXC 2064/1 - 390727645 and SFB 1233, Robust Vision: Inference Principles and Neural Mechanisms.
Journal of Vision September 2024, Vol.24, 1268. doi:https://doi.org/10.1167/jov.24.10.1268
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge; Generalizing fixation predictions within and across datasets: towards a universal model of free-viewing fixations. Journal of Vision 2024;24(10):1268. https://doi.org/10.1167/jov.24.10.1268.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Predicting free-viewing fixation locations has a long history both in vision science and in computer vision. Recent high-performing models are deep learning based models that are trained on an eye movement dataset such as MIT1003, and subsequently evaluated on benchmarks such as the MIT/Tuebingen Saliency benchmark, which assess model performance on one or multiple datasets. An important challenge that has only marginally been addressed so far, is the desire for saliency models to generalize across different domains, correctly predicting fixation densities for any image and recording setup. In this work, we combine a substantial range of eye movement datasets, including MIT1003, CAT2000, COCO Freeview, FIGRIM, NUSEF, OSIE and others to create a large-scale compound dataset that we envision to grow further over time aiming for maximal size and diversity. On this dataset, we train a fixation prediction model, which is an extended and improved variant of DeepGaze IIE, combining multiple pretrained deep backbones in a joint readout architecture. After training on all or a subset of these datasets, the model is evaluated on the validation splits of all datasets. Our best model improves state-of-the-art by a significant margin on many commonly used benchmark datasets, including MIT300, CAT2000 and COCO Freeview. Our modeling paradigm allows us to assess to which degree gaze patterns from one dataset generalize to other datasets, to which degree using multiple datasets creates synergy effects due to the larger diversity in the data, or to which degree different datasets show conflicting patterns. For example, we find that different datasets require different rescalings of local priority values in a way that is partially, but not fully, explained by different presentation times. Such analyses hint at underlying mechanisms that need to be understood and incorporated into models for building fixation models which are reliably applicable in diverse contexts.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×