August 2023
Volume 23, Issue 9
Open Access
Vision Sciences Society Annual Meeting Abstract  |   August 2023
Visual angle and image context alter the alignment between deep convolutional neural networks and the macaque ventral stream
Author Affiliations & Notes
  • Sara Djambazovska
    Swiss Federal Institute of Technology, Lausanne (EPFL)
    Harvard Medical School
  • Gabriel Kreiman
    Harvard Medical School
  • Kohitij Kar
    York University
  • Footnotes
    Acknowledgements  KK was supported by the Canada Research Chair Program. This research was undertaken thanks in part to funding from the Canada First Research Excellence Fund. SD was supported by the Bertarelli Foundation. We also thank Jim DiCarlo, and Sarah Goulding for their support.
Journal of Vision August 2023, Vol.23, 5539. doi:
  • Views
  • Share
  • Tools
    • Alerts
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Sara Djambazovska, Gabriel Kreiman, Kohitij Kar; Visual angle and image context alter the alignment between deep convolutional neural networks and the macaque ventral stream. Journal of Vision 2023;23(9):5539.

      Download citation file:

      © ARVO (1962-2015); The Authors (2016-present)

  • Supplements

A family of deep convolutional neural networks (DCNNs) currently best explains primate ventral stream activity that supports object recognition. Such models are often evaluated with neurobehavioral datasets where the stimuli are presented in the subjects’ central field of view (FOV). However, the exact visual angle often varies widely across studies (e.g., 8 degrees for Yamins et al., 2014; 2.9 degrees for Khaligh-Razavi et al., 2014; catered to V1 neuronal receptive field, 2 degrees for Cadena et al., 2019). A unified model of the primate visual system cannot have a varying FOV. Similarly, the type of images used for model evaluation vary across studies, ranging from objects embedded in randomized contexts (Yamins et al., 2014) to objects with no contexts (Khaligh-Razavi et al., 2014). Here we systematically tested how the predictivity of macaque inferior temporal (IT) neurons by DCNNs depends on the FOV and the image-context. We used images (“full-context”) from the Microsoft COCO imageset. We performed large-scale recordings in one macaque (~100 IT sites) while the monkey passively fixated images presented at 20 and 30 degrees. To estimate the optimal FOV for the DCNNs, we compared the DCNN IT predictivity at varying image crop sizes. We observed that ~ 8-10 visual degree crops produced the strongest DCNN IT predictions. Next, to test the effect of image-context, we generated two versions of the original images: object only (“no- context”) and swapped backgrounds (“incongruent-context”). DCNN's IT predictivity was significantly lower for “incongruent-context” compared to the “no/full-context” images. Interestingly, we observed a more significant gap between early (90-120ms) and late (150 -180ms) response predictivity for “incongruent-context” compared to “no/full-context” images, suggesting stronger putative feedback signals during such contextual manipulations. In sum, our results provide critical constraints to guide the development of more brain-aligned DCNN models of the primate vision.


This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.