Abstract
The existence of face-selective neurons in higher visual cortex is one of most robust findings in visual cognitive neuroscience. It is a matter of ongoing debate whether face selectivity is innately specified via long-term evolutionary pressures, or arises from extensive experience with faces during development. To examine this question, we used deep learning methods (Krizhevsky et al. 2012) to train a hierarchical convolutional neural network for optimal performance on an object recognition task, using a network architecture that has been shown to be highly predictive of neural responses in higher visual cortex (Yamins et al., 2014). Critically, this training involved no exposure to faces, bodies, or animate objects of any kind. We then tested this “face-deprived” network on a standard face-localizer image set (ie. faces, bodies, scene, objects and scrambled objects). Surprisingly, we found a significant fraction of face-selective units (7.2±0.83%, p< 0.01). These units retained a strongly face-selective response profile on an independent set of highly variable testing images. We found significantly smaller fractions of units selective for other categories (bodies, scenes, &c). To simulate the effect of a small amount of ecologically balanced visual experience, we then fit a linear transform of the face-deprived model units to the neural response profiles for a population of neural sites from macaque inferior temporal (IT) cortex, measured on a set of 256 images of which 32 contained faces. We found that the fraction of face-selective units in this simulated IT population when evaluated on the localizer set (28.3±3.3%, p< 0.01) is consistent with independent measurements from IT. Taken together, these results suggest that the observed face-selectivity in higher visual cortex may arise neither from face-specific innate genetic specification or significant face-focused experiential tuning, but rather may largely be the by-product of machinery built for non-face-specific object-recognition tasks.
Meeting abstract presented at VSS 2015