September 2019
Volume 19, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2019
Comparing visual object representational similarity in convolutional neural networks and the human ventral visual regions
Author Affiliations & Notes
  • Yaoda Xu
    Psychology Department, Yale University
  • Maryam Vaziri-Pashkam
    National Institute of Mental Health
Journal of Vision September 2019, Vol.19, 249c. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yaoda Xu, Maryam Vaziri-Pashkam; Comparing visual object representational similarity in convolutional neural networks and the human ventral visual regions. Journal of Vision 2019;19(10):249c.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

Convolutional neural networks (CNNs) have achieved amazing successes in visual object categorization tasks in recent years. Some have considered them to be good working models of the primate visual system, with different CNN layers corresponding to different levels of visual processing in the primate brain. However, much remains unknown about how visual information is processed within a CNN, making it more like a blackbox than a well understood system. Using methods developed in human neuroscience, here we examined information processing in several CNNs using two approaches. In our first approach, we compared the representational similarities of visual object categories from the different CNN layers to those obtained in retino-topically and functionally defined human occipito-temporal regions through fMRI studies. Regardless of the specific CNN examined, when natural object categories were used, the representational similarities of the early CNN layers aligned well with those obtained from early visual areas, presumably because these layers were modeled directly after these brain regions. However, representational similarities of the later CNN layers diverged from those of the higher ventral visual object processing regions. When artificial shape categories were used, both early and late CNN layers diverged from the corresponding human visual regions. In our second approach, we compared visual representations in CNNs and human visual processing regions in terms of their tolerance to changes in image format, position, size, and the spatial frequency content of an image. Again, we observed several differences between CNNs and the human visual regions. Thus despite CNNs’ ability to successfully perform visual object categorization, they process visual information somewhat differently from the human brain. As such current CNNs may not be directly applicable to model the details of the human visual system. Nevertheless, such brain and CNN comparisons can be useful to guide the training of more human brain-like CNNs.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.