Abstract
Convolutional neural networks (CNNs) show striking similarities to the ventral visual stream. However, their image classification behaviour on adversarially perturbed images diverges from that of humans. In particular, human-imperceptible image perturbations can cause a CNN to misclassify the image. Recent work suggests that the degree to which a system is robust to these perturbations could be related to the power law exponent, α, of the eigenspectrum of its set of neural responses. Informally, the theory states that if α < 1, then small perturbations to a stimulus could result in unbounded changes to the neural responses. Here, we test this hypothesis by comparing predictions from a set of standard and “robust” CNNs with neural responses in rodent and macaque primary visual cortex (V1). Specifically, we relate these models’ V1 fit quality with their accuracy on adversarial images, and their power law exponents. For both macaque and rodent neural responses, we found that model correspondence to V1 was correlated with adversarial accuracy. We then investigated the relationship between α and adversarial accuracy. When comparing a non-robust model with its robust counterpart, we found that all robust counterparts had higher α. Between α and V1 response predictivity, we similarly found that the robust counterparts of non-robust models had higher α and higher V1 predictivity. Across architectures, however, there was no relationship between these two quantities. Finally, we found that neurons in robust models were generally tuned to lower spatial frequencies than those of non-robust models and that adversarial accuracy was somewhat negatively correlated with spatial frequency tuning. These observations suggest that developing biologically plausible techniques to increase α (e.g., by reducing representation dimensionality) for a given architecture and to bias models to learn image features of lower spatial frequencies may improve tolerance to perturbations and V1 response predictivity.