September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Meaning maps and deep neural networks are insensitive to meaning when predicting human fixations
Author Affiliations & Notes
  • Marek A. Pedziwiatr
    Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University
  • Thomas S.A. Wallis
    Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen
    Wilhelm-Schickard Institute for Computer Science (Informatik), University of Tübingen
  • Matthias Kümmerer
    Werner Reichardt Centre for Integrative Neuroscience, University of Tübingen
  • Christoph Teufel
    Cardiff University Brain Research Imaging Centre, School of Psychology, Cardiff University
Journal of Vision September 2019, Vol.19, 253c. doi:https://doi.org/10.1167/19.10.253c
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Marek A. Pedziwiatr, Thomas S.A. Wallis, Matthias Kümmerer, Christoph Teufel; Meaning maps and deep neural networks are insensitive to meaning when predicting human fixations. Journal of Vision 2019;19(10):253c. https://doi.org/10.1167/19.10.253c.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

An important aspect of vision – control of eye-movements in scene viewing – is intensely debated, with many studies suggesting that people look at scene regions rich in meaning. A recent proposal suggests that the distribution of meaning can be quantified by ‘Meaning Maps’ (MMs). To create MMs, images are segmented into partially overlapping patches, which are rated for their meaningfulness by multiple observers. These ratings are combined into a smooth distribution over the image. If MMs capture the distribution of meaning, and if the deployment of eye-movements in humans is guided by meaning, two predictions arise: first, MMs should be better predictors of gaze position than saliency models, which use image features rather than meaning to predict fixations; second, differences in eye movements that result from changes in meaning should be reflected in equivalent differences in MMs. Here, we tested these predictions. Results show that MMs performed better than the simplest saliency model (GBVS), were similar to a more advanced model (AWS), and were outperformed by DeepGaze II – a model using features from a deep neural network. These data suggest that, similar to saliency models, MMs might not measure meaning but index the distribution of features. Using the SCEGRAM database, we tested this notion directly by comparing scenes containing consistent object-context relationships with identical images, in which one object was contextually inconsistent, thus changing its meaning (e.g., a kitchen with a mug swapped for a toilet roll). Replicating previous studies, regions containing inconsistencies attracted more fixations from observers than the same regions in consistent scenes. Crucially, however, MMs of the modified scenes did not attribute more ‘meaning’ to these regions. DeepGaze II exhibited the same insensitivity to meaning. Both methods are thus unable to capture changes in the deployment of eye-movements induced by changes of an image’s meaning.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×