October 2020
Volume 20, Issue 11
Open Access
Vision Sciences Society Annual Meeting Abstract  |   October 2020
Saliency Map Predictions of DeepGaze II are Influenced by the Convolutional Neural Network Texture Bias
Author Affiliations & Notes
  • Charlotte A. Leferink
    Department of Psychology, University of Toronto
  • Dirk B. Walther
    Department of Psychology, University of Toronto
    Samsung Artificial Intelligence Center Toronto
  • Footnotes
    Acknowledgements  This work was supported by an NSERC Discovery Grant (#498390) and the Canadian Foundation for Innovation (#32896) to DBW.
Journal of Vision October 2020, Vol.20, 963. doi:https://doi.org/10.1167/jov.20.11.963
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Charlotte A. Leferink, Dirk B. Walther; Saliency Map Predictions of DeepGaze II are Influenced by the Convolutional Neural Network Texture Bias. Journal of Vision 2020;20(11):963. https://doi.org/10.1167/jov.20.11.963.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Though line drawings depict edges of objects and not texture nor colour, which is typically present in the natural environment, humans can recognize scenes depicted in line drawings just as well as those in colour photographs. It has been shown recently that most convolutional neural networks (CNNs) rely on texture more than they rely on edges and shapes of the objects depicted in an image. But what are the effects of these model constraints on modelling visual attention? Here we show that, like humans, a leading CNN-based model of spatial attention, DeepGaze II, generalizes well between photographs and edge-extracted images. Seemingly innocuous low-level changes, however, such as reversing the contrast polarity of the edge-extracted images, cause vastly different predictions by the attention model. This is not the case for human observers, since contrast polarity reversal maintains the structure and global properties of the objects within an image. These results provide further evidence of the reliance of CNNs on texture-based visual information for generating its predictions. To further explore these questions, we recorded eye movements of participants viewing edge-extracted versions of the images from the MIT1003 dataset, which serves as ground truth for DeepGaze II. Comparisons of predicted gaze maps generated from line drawings and from photographs with human fixations on line drawings and photographs showed that both humans and the model generalize well between those drastically different representations of the scenes. Changing contrast polarity of the drawings, on the other hand, drastically changed the predicted gaze maps. Our unique eye movements data set and analysis procedures allow us to further explore the limitations of the CNN texture bias, and to further investigate the capacity of CNNs to learn global shape representations as they apply to directing visual attention.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.