Purchase this article with an account.
Nicholas M. Blauch, Filipe De Avila Belbute Peres, Juhi Farooqui, Alireza Chaman Zar, David Plaut, Marlene Behrmann; Assessing the similarity of cortical object and scene representations through cross-validated voxel encoding models. Journal of Vision 2019;19(10):188d. doi: https://doi.org/10.1167/19.10.188d.
Download citation file:
© ARVO (1962-2015); The Authors (2016-present)
Object and scene perception are instantiated in overlapping networks of cortical regions, including three scene-selective areas in parahippocampal, occipital, and medial parietal cortex (PPA, OPA, and MPA), and a lateral occipital cortical area (LOC) selective for intact objects. The exact contributions of these regions to object and scene perception remain unknown. Here, we leverage BOLD5000 (Chang et. al, 2018), a public fMRI dataset containing responses to ~5000 images in ImageNet, COCO, and Scenes databases, to better understand the roles of these regions in visual perception. These databases vary in the degree to which images focus on single objects, a few objects, or whole scenes, respectively. We build voxel encoding models based on features from a deep convolutional neural network (DCNN) and assess the generalization of our encoding models trained and tested on all combinations of ImageNet, COCO, and Scenes databases. As predicted, we find good generalization between models trained and tested on ImageNet and COCO and poor generalization between ImageNet/COCO trained models and Scenes for most DCNN layer/ROI encoding models. Surprisingly, we find generalization from ImageNet/COCO to Scenes only in early visual cortex with encoding models of intermediate DCNN layers. Additionally, LOC and PPA exhibit similarly good generalization between ImageNet and COCO and poor generalization to Scenes. Excluding MPA responses to Scenes, all scene-selective areas generalize well to held-out data in the trained image database, but PPA exhibits the most robust generalization out-of-database between ImageNet and COCO, reflecting a more general perceptual role. Our work reflects a novel application of encoding models in neuroscience in which distinct stimulus sets are used for training and testing in order to test the similarity of representations underlying these stimuli. We plan to further test the effect of pretraining the DCNN on Places365 rather than ImageNet, and to look at image-level predictors of generalization.
This PDF is available to Subscribers Only