September 2024
Volume 24, Issue 10
Open Access
Vision Sciences Society Annual Meeting Abstract  |   September 2024
Leveraging Vision and Language Generative Models to Understand the Visual Cortex
Author Affiliations & Notes
  • Andrew Luo
    Carnegie Mellon University
  • Margaret Henderson
    Carnegie Mellon University
  • Leila Wehbe
    Carnegie Mellon University
  • Michael Tarr
    Carnegie Mellon University
  • Footnotes
    Acknowledgements  This work used Bridges-2 at Pittsburgh Supercomputing Center through allocation SOC220017 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.
Journal of Vision September 2024, Vol.24, 1333. doi:https://doi.org/10.1167/jov.24.10.1333
  • Views
  • Share
  • Tools
    • Alerts
      ×
      This feature is available to authenticated users only.
      Sign In or Create an Account ×
    • Get Citation

      Andrew Luo, Margaret Henderson, Leila Wehbe, Michael Tarr; Leveraging Vision and Language Generative Models to Understand the Visual Cortex. Journal of Vision 2024;24(10):1333. https://doi.org/10.1167/jov.24.10.1333.

      Download citation file:


      © ARVO (1962-2015); The Authors (2016-present)

      ×
  • Supplements
Abstract

Understanding the functional organization of the higher visual cortex is a fundamental goal in neuroscience. Traditional approaches have focused on mapping the visual and semantic selectivity of neural populations using hand-selected, non-naturalistic stimuli, which require a priori hypotheses about visual cortex selectivity. To address these limitations, we introduce two data-driven methods: Brain Diffusion for Visual Exploration ('BrainDiVE') and Semantic Captioning Using Brain Alignments ('BrainSCUBA'). BrainDiVE synthesizes images predicted to activate specific brain regions, having been trained on a dataset of natural images and paired fMRI recordings, thus bypassing the need for hand-crafted visual stimuli. This approach leverages large-scale diffusion models combined with brain-gradient guided image synthesis. We demonstrate the synthesis of preferred images with high semantic specificity for category-selective regions of interest (ROIs). This method further enables the characterization of differences and novel functional subdivisions within ROIs, which we validated with behavioral data. BrainSCUBA, on the other hand, generates natural language descriptions for images predicted to maximally activate individual voxels. Utilizing a contrastive vision-language model and a pre-trained large language model, BrainSCUBA generates interpretable captions, enabling text-conditioned image synthesis. This method shows that the generated images are semantically coherent and achieve high predicted activations. In exploratory studies on the distribution of 'person' representations in the brain, we observe fine-grained semantic selectivity in body-selective areas. Together, these two methods offer well-specified constraints for future hypothesis-driven examinations and demonstrate the potential of data-driven approaches in uncovering visual cortex organization.

×
×

This PDF is available to Subscribers Only

Sign in or purchase a subscription to access this content. ×

You must be signed into an individual account to use this feature.

×