Abstract
How do category-selective neural responses in the ventral visual stream arise? Maybe brains have specializations for faces, bodies, and places because these categories serve distinct post-perceptual processes like social cognition and navigation. Alternatively, simple exposure to natural visual input may suffice. The second hypothesis is hard to test in humans, but we can test its in-principle possibility by asking whether similar category selective responses arise in deep neural networks (DNNs) that have no domain-specific priors and know nothing about the meaning of these categories to humans (see also Prince et al., 2023). We trained DNN models unsupervised with contrastive embedding objectives on an ecologically representative dataset (Ecoset), and then used nonnegative matrix factorization to discover dominant components of the network’s responses to natural images. These components included three with response profiles selective for places, faces, and food, respectively, as measured by correlations with a) responses of previously identified fMRI components from the ventral pathway (r = 0.6, 0.6, and 0.5, Khosla et al, 2022) and b) human ratings of the salience of places (r=.4), faces (.5), and food (.5) in the images. Thus, category selectivities could arise in brains from natural visual input statistics, without strong domain-specific priors. What properties of the training diet are critical for these category selectivities to emerge? To assess the role of color, we trained a DNN on grayscale Ecoset images, which largely retained selectivities to faces and places but not food. In contrast, a DNN trained on "cutout" images, with backgrounds removed, failed to develop robust selectivity for any of these categories, despite achieving similar performance on image categorization. These results suggest that while category selectivity could in principle emerge without domain specific priors, from mere exposure to natural visual environments, the presence of full scene context may play a crucial role.