December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Meaning maps detect the removal of local scene content but deep saliency models do not
Author Affiliations & Notes
  • Taylor R. Hayes
    University of California, Davis
  • John M. Henderson
    University of California, Davis
  • Footnotes
    Acknowledgements  Supported by the National Science Foundation (BCS-2019445)
Journal of Vision December 2022, Vol.22, 3752. doi:https://doi.org/10.1167/jov.22.14.3752
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Taylor R. Hayes, John M. Henderson; Meaning maps detect the removal of local scene content but deep saliency models do not. Journal of Vision 2022;22(14):3752. https://doi.org/10.1167/jov.22.14.3752.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Stored semantic knowledge gained from experience is thought to play an important role in how we guide our attention in real-world scenes. Meaning mapping (Henderson & Hayes, 2017) uses human raters to directly estimate different semantic features in scenes, and has been a useful tool in demonstrating the important role semantics play in guiding attention. However, it has recently been suggested that meaning maps do not capture semantic content in scenes, but like deep learning models of scene attention, instead represent semantically-neutral image features. To directly test this hypothesis, we compared intact scene images to scene images in which local semantic content was removed using a diffeomorphic transformation (Stojanoski & Cusack, 2014). The diffeomorphic transformation was designed to preserve image properties while removing meaning providing an ideal test for our question of interest by serving as an adversarial image. The current experiment tested whether human-generated meaning maps (N=164 original and N=164 diffeomorphed) and three state-of-the-art deep learning models (MSI-Net, DeepGaze II, and SAM-ResNet) were sensitive to the loss of semantic content in the diffeomorphed scene regions. If humans base their ratings of meaning on semantically-neutral image features, then they should rate the diffeomorphed regions similarly to the original non-diffeomorphed regions. However, if humans base their meaning ratings on semantic content, then we should observe a large decrease in the diffeomorphed regions where the semantic scene content has been removed. The results were clear: meaning maps generated by human raters showed a large decrease in the diffeomorphed scene regions, while all three deep saliency models showed a moderate increase in the diffeomorphed scene regions. These results demonstrate that meaning maps reflect local semantic content in scenes while deep saliency models reflect something else. We conclude the meaning mapping approach is an effective tool for estimating semantic content in scenes.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×