Abstract
Understanding the functional organization of the higher visual cortex is a fundamental goal in neuroscience. Traditional approaches have focused on mapping the visual and semantic selectivity of neural populations using hand-selected, non-naturalistic stimuli, which require a priori hypotheses about visual cortex selectivity. To address these limitations, we introduce two data-driven methods: Brain Diffusion for Visual Exploration ('BrainDiVE') and Semantic Captioning Using Brain Alignments ('BrainSCUBA'). BrainDiVE synthesizes images predicted to activate specific brain regions, having been trained on a dataset of natural images and paired fMRI recordings, thus bypassing the need for hand-crafted visual stimuli. This approach leverages large-scale diffusion models combined with brain-gradient guided image synthesis. We demonstrate the synthesis of preferred images with high semantic specificity for category-selective regions of interest (ROIs). This method further enables the characterization of differences and novel functional subdivisions within ROIs, which we validated with behavioral data. BrainSCUBA, on the other hand, generates natural language descriptions for images predicted to maximally activate individual voxels. Utilizing a contrastive vision-language model and a pre-trained large language model, BrainSCUBA generates interpretable captions, enabling text-conditioned image synthesis. This method shows that the generated images are semantically coherent and achieve high predicted activations. In exploratory studies on the distribution of 'person' representations in the brain, we observe fine-grained semantic selectivity in body-selective areas. Together, these two methods offer well-specified constraints for future hypothesis-driven examinations and demonstrate the potential of data-driven approaches in uncovering visual cortex organization.