Abstract
Convolutional neural networks (CNNs) have been shown to bear representational similarities to neural activity patterns and behavior in response to visual stimuli, indicating their potential as computational models of visual processing. Typically, for a given convolutional layer in a CNN, feature maps are concatenated before comparing them with brains and behavior. However, feature maps contain a lot of redundant information, leaving open the question to what degree they can be reduced without losing information. Since feature maps contain spatial information, downsampling can reveal what level of spatial detail is required for comparisons to brains and behavior. Across numerous datasets, we evaluated how downsampling affects the correspondence between CNNs and human behavioral, fMRI, and MEG responses, using representational similarity analysis. Results show that in many cases, downsampling even increased representational similarities while preserving the dynamical and hierarchical features of their correspondence to brain data. For the first three convolutional layers, downsampling to 25% of the features led to consistent improvements for all analyses involving fMRI data, while across all data modalities, this was the case for 90.5% of the analyses. In contrast, effects were smaller for higher convolutional layers, with improvements only in 64.3% of cases. More extreme downsampling to 10% of the features continued to yield positive effects for fMRI analyses. These results demonstrate that the level of spatial detail used for evaluating representational similarities might be comparably low, indicating that similarities between CNNs and human data may be based on simpler information than previously thought. In addition, downsampling may be a simple yet effective method for improving statistical power while reducing the computational burden. In conclusion, we highlight the possibility of resizing feature maps for maintaining or improving the representational correspondence between networks, brains, and behavior, with implications for the assumed complexity of measured representational similarities.