September 2017
Volume 17, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2017
DeepGaze II: Predicting fixations from deep features over time and tasks
Author Affiliations
  • Matthias Kümmerer
    Werner-Reichardt-Centre for Integrative Neuroscience, University Tübingen
  • Tom Wallis
    Werner-Reichardt-Centre for Integrative Neuroscience, University Tübingen
    Bernstein Center for Computational Neuroscience, Tübingen
  • Matthias Bethge
    Werner-Reichardt-Centre for Integrative Neuroscience, University Tübingen
    Bernstein Center for Computational Neuroscience, Tübingen
Journal of Vision August 2017, Vol.17, 1147. doi:10.1167/17.10.1147
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Matthias Kümmerer, Tom Wallis, Matthias Bethge; DeepGaze II: Predicting fixations from deep features over time and tasks. Journal of Vision 2017;17(10):1147. doi: 10.1167/17.10.1147.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Where humans choose to look can tell us a lot about behaviour in a variety of tasks. Over the last decade numerous models have been proposed to explain fixations when viewing still images. Until recently these models failed to capture a substantial amount of the explainable mutual information between image content and the fixation locations (Kümmerer et al, PNAS 2015). This limitation can be tackled effectively by using a transfer learning strategy ("DeepGaze I", Kümmerer et al. ICLR workshop 2015), in which features learned on object recognition are used to predict fixations. Our new model "DeepGaze II" converts an image into the high-dimensional feature space of the VGG network. A simple readout network is then used to yield a density prediction. The readout network is pre-trained on the SALICON dataset and fine-tuned on the MIT1003 dataset. DeepGaze II explains 82% of the explainable information on held out data and is achieving top performance on the MIT Saliency Benchmark. The modular architecture of DeepGaze II allows a number of interesting applications. By retraining on partial data, we show that fixations after 500ms presentation time are driven by qualitatively different features than the first 500ms, and we can predict on which images these changes will be largest. Additionally we analyse how different viewing tasks (dataset from Koehler et al. 2014) change fixation behaviour and show that we are able to predict the viewing task from the fixation locations. Finally, we investigate how much fixations are driven by low-level cues versus high-level content: By replacing the VGG features with isotropic mean-luminance-contrast features, we create a low-level saliency model that outperforms all saliency models before DeepGaze I (including saliency models using DNNs and other high level features). We analyse how the contributions of high-level and low-level features to fixation locations change over time.

Meeting abstract presented at VSS 2017

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×