September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Unveiling core, interpretable image properties underlying model-brain similarity with generative models
Author Affiliations
  • Yingqi Rong
    Johns Hopkins University
  • Colin Conwell
    Johns Hopkins University
  • Dianna Hidalgo
    Harvard Medical School
  • Michael Bonner
    Johns Hopkins University
Journal of Vision September 2024, Vol.24, 1240. doi:https://doi.org/10.1167/jov.24.10.1240
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Yingqi Rong, Colin Conwell, Dianna Hidalgo, Michael Bonner; Unveiling core, interpretable image properties underlying model-brain similarity with generative models. Journal of Vision 2024;24(10):1240. https://doi.org/10.1167/jov.24.10.1240.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Deep Neural Networks (DNNs) are now capable of predicting the hierarchy of natural images representations in human visual cortex with substantial accuracy. However, a key challenge in the use of these networks to predict representations in the brain is discerning the specific properties of these networks that underlie their predictive accuracy. In this work, we developed an approach for leveraging high-throughput generative vision models to run targeted, hypothesis-driven experiments on the key image properties that drive DNN predictions of brain representation. Specifically, we used diffusion models to create diverse image variations while preserving targeted image information. This targeted information included specific visual features (e.g. edges, background) as well as semantics from captions and categories. Using our synthesized image variations, we quantified the impact of each interpretable manipulation on the representational similarity between AlexNet activations and image-evoked fMRI responses in early visual and occipital temporal cortex (EVC, OTC). We found that representational similarity to high-level OTC (but not EVC) was stable so long as the synthesized images retained their semantic content, and this effect was robust to substantial structural variations in the synthesized images. To demonstrate the broad utility of this method, we quantified the influence of objects, backgrounds, shapes, and other visual details on model performance, and we performed analogous targeted experiments on aspects of higher-level scene semantics (e.g. object relations). Overall, these findings highlight the promise of employing generative models to probe brain-model similarities. Our work provides insight into how specific forms of image information shape the relationship between computational models and brain responses, and it paves the way for a deeper understanding of how models approximate biological visual processing.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×