Abstract
Stored semantic knowledge gained from experience is thought to play an important role in how we guide our attention in real-world scenes. Meaning mapping (Henderson & Hayes, 2017) uses human raters to directly estimate different semantic features in scenes, and has been a useful tool in demonstrating the important role semantics play in guiding attention. However, it has recently been suggested that meaning maps do not capture semantic content in scenes, but like deep learning models of scene attention, instead represent semantically-neutral image features. To directly test this hypothesis, we compared intact scene images to scene images in which local semantic content was removed using a diffeomorphic transformation (Stojanoski & Cusack, 2014). The diffeomorphic transformation was designed to preserve image properties while removing meaning providing an ideal test for our question of interest by serving as an adversarial image. The current experiment tested whether human-generated meaning maps (N=164 original and N=164 diffeomorphed) and three state-of-the-art deep learning models (MSI-Net, DeepGaze II, and SAM-ResNet) were sensitive to the loss of semantic content in the diffeomorphed scene regions. If humans base their ratings of meaning on semantically-neutral image features, then they should rate the diffeomorphed regions similarly to the original non-diffeomorphed regions. However, if humans base their meaning ratings on semantic content, then we should observe a large decrease in the diffeomorphed regions where the semantic scene content has been removed. The results were clear: meaning maps generated by human raters showed a large decrease in the diffeomorphed scene regions, while all three deep saliency models showed a moderate increase in the diffeomorphed scene regions. These results demonstrate that meaning maps reflect local semantic content in scenes while deep saliency models reflect something else. We conclude the meaning mapping approach is an effective tool for estimating semantic content in scenes.