Abstract
Recent studies have adopted multivariate data-driven approaches like representational similarity analysis to examine the neural representation that underpins object recognition. This approach has been greatly facilitated by the recent development of large-scale datasets and neural encoding models precisely predicting neural responses to visual stimuli. The present study builds upon these advancements to elucidate neural representations underlying object recognition that have remained elusive with existing methods and datasets. We firstly applied principal component analysis on voxel patterns of publicly available fMRI data derived from thousands of visual images in the Natural Scenes Dataset (NSD; Allen et al, 2022). This analysis allowed us to capture the image distributions along each principal component and, more importantly, identify areas with sparse or no corresponding images within high-dimensional voxel response space. To understand the visual information represented in these blank areas, the latent vector of a generative model (autoencoder or BigGAN-deep) was optimized to generate an image eliciting a voxel pattern corresponding to the blank area. This was achieved by integrating the generative model with an encoding model (the feature-weighted receptive field model; St-Yves & Naselaris, 2018), which was trained to predict a voxel pattern to a visual image. We found that the autoencoder successfully synthesized visual images anticipated to elicit desired voxel patterns for each brain region. In contrast, BigGAN-deep failed to synthesize such images, likely due to the strong constraint imposed by its class embedding. Our approach enables the exploration of “unexplored” areas in the high-dimensional voxel response space, potentially leading to the discovery of novel neural representations. Additionally, image synthesis with the encoding model may offer a more feasible means of inducing a specific voxel pattern to enhance brain function or behavior, providing an advantage over conventional methods like decoded neurofeedback, where subjects voluntarily control their voxel patterns based on real-time feedback.