Abstract
DeepGaze (Kümmerer et al., 2017) is an AI platform that is trained on and designed to generate saliency maps for natural images that are emotionally neutral. To what extent it can be used to generate saliency maps for natural images varying in emotional content remains not known. We addressed the question (1) by comparing the DeepGaze-generated saliency maps with eye-tracking data from human observers viewing affective scenes from the International Affective Picture System (IAPS) (Fan et al., 2018) and (2) by analyzing the impact of occluding the DeepGaze-generated saliency map on emotion recognition by a recently proposed VisualCortex-Amygdala (VCA) deep learning system for valence analysis. We found that (1) the saliency maps generated by DeepGaze matched the eye-tracking data better than that generated by other methods including the Grad-CAM (Selvaraju et al., 2017) and (2) occluding the saliency map generated by DeepGaze led to significantly more deteriorated valence recognition by the VCA system compared to occluding random patches of voxels. These results suggest that DeepGaze, despite being trained on emotionally neutral images, can generate saliency maps that convey the emotional significance of the images.