Abstract
Along the human ventral visual stream, there are regions of cortex that show strong overall response selectivity for faces (FFA) and places (PPA). A prominent hypothesis is that these regions have distinct feature tuning specialized for their preferred stimulus domains. However, the presence of systematic information about non-preferred stimulus classes in these regions suggests an alternative hypothesis: that category-selective regions may reflect different facets of a larger-scale population code with an integrated feature space. We tested these competing hypotheses using deep neural networks. To operationalize specialized feature spaces, we trained a pair of resource-matched networks: one was trained only on face images to recognize different identities, and the other was trained only on place images to do scene categorization. To operationalize an “integrated” feature space, we trained a network to perform object recognition sampled from many different categories. Within this network, we localized face- and place-selective units in each layer, following the same procedures used when analyzing functional neuroimaging (fMRI) data. We then used representational similarity analyses to evaluate how well these different feature spaces matched human fMRI data in two datasets. Overall, human FFA and PPA representational geometries were better fit by the responses of face and place units defined from the integrated model (best layer FFA: r=0.85, PPA: r=0.74), compared to responses of the specialized models (FFA: r=0.62, PPA: r=0.47). More strikingly, variance partitioning analyses revealed that nearly all explainable, non-shared variance was accounted for by the integrated model. These patterns held across model layers, were robust across different base model architectures, and did not emerge in untrained networks. Together, these converging results suggest that representations in category-selective regions may be better understood as facets of a common feature bank that can discriminate among many classes of objects, rather than as distinct modules with specialized, unrelated feature tuning.