Abstract
Convolutional neural networks (CNNs) are shown to be good models of the human visual system. In particular, when processing visual object information, the earlier layers of CNNs behave similarly to lower-level visual areas in the human cortex (e.g., V1), whereas the deeper layers of CNNs behave similarly to higher-level visual areas (e.g., inferotemporal cortex). In this study we sought to examine the similarity and difference in CNN response patterns and human visual response patterns when processing affective scenes. FMRI data were recorded from human participants viewing pleasant, neutral and unpleasant images from the International Affective Picture System (IAPS). The VGG 16 trained on ImageNet data was taken to be the CNN model of the human visual system. Applying representational similarity analysis, we constructed representational dissimilarity matrices (RDMs) for different brain areas using fMRI data and for different pooling layers of the VGG16, and correlated the RDMs from the neural data and that from the CNN model. Strongest correlations were found between pooling layer 4 and visual areas in both the ventral and dorsal pathways. Further analysis will include (1) characterizing emotion processing in the early visual cortex, the dorsal visual cortex, and the ventral visual cortex, (2) characterizing emotion processing in different layers of the CNN model, (3) quantifying the proportion of explainable brain RDM variance captured by specific CNN layers, (4) visualizing the emotion representations in visual pathways and CNN layers using multi-dimensional scaling (MDS), and (5) decoding emotion representations in visual pathways using emotion representations from features of CNN layers.