Abstract
The ventral visual stream is a complex, nonlinear system whose internal representations are currently best approximated through deep learning in convolutional neural networks (CNNs). Neuroscientists have been in search of the core principles that explain why some CNNs are better than others at predicting the responses in visual cortex. Previous efforts have focused on factors related to architecture, training task, visual diet, and interpretable properties of the learned features. Here, we take a different approach and seek to understand the performance of CNN models of the ventral stream in terms of their latent geometric properties. Specifically, we focus on latent dimensionality, which is the number of dimensions spanned by the activity space of a CNN’s responses to natural images. While low dimensionality can promote invariance to incidental image properties, high dimensionality increases expressivity and can support a wider range of behaviors. Thus, we asked: what level of CNN dimensionality is best for modeling activity in the primate ventral stream? To address this question, we estimated the dimensionality of a large set of CNNs trained on a variety of tasks using multiple datasets, and we assessed how well these CNNs performed as linear encoding models of object-evoked responses in ventral visual cortex using both human fMRI and monkey electrophysiology data. These analyses revealed a striking effect: higher dimensional CNNs were better at predicting cortical responses. Importantly, our results cannot be explained as trivial statistical effects of dimensionality when fitting linear encoding models. There are, in fact, many alternative conditions under which high-dimensional models cannot accurately predict neural data, which we demonstrated both empirically and through simulations. Together, our findings suggest that CNN models of visual cortex may be best understood in terms of the latent geometric properties of their representations, rather than the idiosyncratic details of their architectures or training procedures.