Abstract
We can rapidly identify the image of a scene as a beach, a forest, or a highway. This ability relies in part on perceptual grouping cues. Interestingly, past studies found that both the human visual system (HVS) and convolutional neural networks (CNNs) are sensitive to and benefit from perceptual grouping cues such as local symmetry in scenes. Yet, we still do not know exactly how local symmetry facilitates scene categorization and whether HVS and CNNs use the cue in a similar manner. In the present study, we explore this question with representational similarity analysis (RSA), in which we compare the scene representations of the HVS with those of the CNN VGG16. Specifically, for the HVS, we computed representational dissimilarity matrices (RDMs) for ten regions of interest (ROIs) in the visual cortex using the BOLD5000 dataset. For VGG16, we created an RDM for each convolutional layer. Subsequently, we measured correlations between the RMDs for the ROIs and VGG16 layers. Moreover, we correlated all RDMs to a symmetry dissimilarity matrix (SDM) based upon the local symmetry in each scene. Consistent with previous results, we found that half of the participants had high correlations between the RDMs and SDM for low-level visual areas (e.g., V1). However, half showed high correlations for mid- to high-level areas (V4 and RSC), suggesting some variability among observers. We also found that later layers of VGG16 exhibited stronger associations with the SDM than earlier layers. We expected such a finding on a feed-forward network like VGG16 because local symmetry inherently involves longer-range relationships, which are present in higher layers due to their large receptive fields. To conclude, although local symmetry influences both the HVS and VGG16, the two systems process this cue differently, likely due to architectural limitations of feed-forward neural networks.