December 2022
Volume 22, Issue 14
Open Access
Vision Sciences Society Annual Meeting Abstract  |   December 2022
Latent dimensionality scales with the performance of deep learning models of visual cortex
Author Affiliations
  • Eric Elmoznino
    Johns Hopkins University
  • Michael Bonner
    Johns Hopkins University
Journal of Vision December 2022, Vol.22, 3704. doi:https://doi.org/10.1167/jov.22.14.3704
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Eric Elmoznino, Michael Bonner; Latent dimensionality scales with the performance of deep learning models of visual cortex. Journal of Vision 2022;22(14):3704. https://doi.org/10.1167/jov.22.14.3704.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

The ventral visual stream is a complex, nonlinear system whose internal representations are currently best approximated through deep learning in convolutional neural networks (CNNs). Neuroscientists have been in search of the core principles that explain why some CNNs are better than others at predicting the responses in visual cortex. Previous efforts have focused on factors related to architecture, training task, visual diet, and interpretable properties of the learned features. Here, we take a different approach and seek to understand the performance of CNN models of the ventral stream in terms of their latent geometric properties. Specifically, we focus on latent dimensionality, which is the number of dimensions spanned by the activity space of a CNN’s responses to natural images. While low dimensionality can promote invariance to incidental image properties, high dimensionality increases expressivity and can support a wider range of behaviors. Thus, we asked: what level of CNN dimensionality is best for modeling activity in the primate ventral stream? To address this question, we estimated the dimensionality of a large set of CNNs trained on a variety of tasks using multiple datasets, and we assessed how well these CNNs performed as linear encoding models of object-evoked responses in ventral visual cortex using both human fMRI and monkey electrophysiology data. These analyses revealed a striking effect: higher dimensional CNNs were better at predicting cortical responses. Importantly, our results cannot be explained as trivial statistical effects of dimensionality when fitting linear encoding models. There are, in fact, many alternative conditions under which high-dimensional models cannot accurately predict neural data, which we demonstrated both empirically and through simulations. Together, our findings suggest that CNN models of visual cortex may be best understood in terms of the latent geometric properties of their representations, rather than the idiosyncratic details of their architectures or training procedures.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×