Abstract
Recent progress in vision science has focused on characterizing how the perceptual similarity of visual stimuli is reflected in the similarity of neural representations. While such neural similarity spaces are well-established in simple feature domains (e.g., orientation columns in V1), a correspondent finding with complex real-world stimuli has yet to be demonstrated. We explored this topic using scene wheels (Son et al. 2021), a continuous scene stimulus space created with scene-trained generative adversarial networks. We generated four different indoor scene wheels in which various global scene properties – spatial layouts, surface texture, component objects, colour schemes, etc. – changed gradually along a circular continuum. Participants were shown scene wheel images during fMRI scanning with a continuous carry-over design to provide stable estimates of scene-specific neural patterns. After scanning, participants rated pairwise perceptual similarity for the same scene wheel images. We also computed two types of physical similarity measures, the angular distances of the images on the scene wheels and their pixel-wise correlation, as well as a semantic similarity measure which indicated scene category as determined by a classifier network. We performed representational similarity analysis by comparing the similarity of scene-specific voxel patterns across multiple high-level visual regions as measures of physical, perceptual, and semantic similarity. We found that for scene wheels constrained to a single scene category (e.g., dining room), the neural patterns in visual cortex mainly represented the physical similarity of the scenes. However, when the scene wheels contain notable category boundaries (e.g., dining rooms and living rooms), both perceptual and category similarity structures were present in neural pattern similarity. These results provide important evidence that similarity structures defined by the complex feature spaces of real-world scenes are coded in neural representations and that such neural representations flexibly code for physical, perceptual, and categorical information.