Abstract
Although Convolutional Neural Networks (CNNs) differ in many ways from human visual cortex, they have been proposed as capturing a possible feature space upon which human visual categorization may be based. Here we compare similarity among exemplars of scene categories using three measures: the feature extracted from the PlaceNet CNN model (Zhou etal., 2014), activity in the PPA, and a behavioral measure of similarity. The same natural scene image set (Torralbo etal., 2013) were used in collecting the similarity matrices across measures. This image sets contains four categories (beach, city, highway, mountain), and each image was rated for representativeness to its category. To obtain PPA activity, a passive viewing fMRI study (N=15) was conducted on subsets of images (city & mountain). To obtain a behavioral measure of similarity, a same-different category judgement task was implemented, and response time was used to calculate the similarity matrices for each participant (N = 77). Euclidean distance among exemplars was used to construct the similarity matrices for each participant and comparisons between matrices were calculated using Spearman’s rho correlations and tested by one-sample Wilcox signed-rank tests. Results showed that behavioral similarity matrices were significantly correlated with later layers of the CNN matrices. However, this pattern was not observed between PPA and CNN layers, although the correlations did improve when the analysis was limited to exemplars that were highly representative of their category. This pattern of data suggests that the feature spaces of CNN layers, especially the fully connected layers, share more similar structures with human behavior than activity in the PPA, at least for this image set.