Abstract
Category-selective regions are a prominent feature of the ventral visual pathway. Why is there specialization for some categories (e.g., faces, scenes), but not others (e.g., food, cars)? And why does functional specialization arise in the first place? Here, we used deep convolutional neural networks (CNNs) to test the hypothesis that face-specific regions are segregated from object regions because face and object recognition require different representations and computations.
We trained two separate AlexNet networks to categorize either faces or objects. The face-trained CNN significantly outperformed the object-trained CNN on face categorization on held-out identities and vice versa, demonstrating that the representations optimized for one task are suboptimal for the other. To determine whether representations could be learned to simultaneously support both tasks, we trained dual-task CNNs with a branched architecture, varying the number of layers that were shared between face and object tasks (i.e., early vs. late branch points; Kell et al., 2018). We found that dual-task networks sharing late layers performed worse than CNNs trained on only faces or only objects. However, dual-task networks sharing only early processing stages, presumably like the primate visual system, showed no cost of sharing.
Do these results generalize to architectures with larger capacity? We trained VGG16 networks on the same tasks. Surprisingly, in this case, even the fully-shared dual-task CNN performed as well as the separate networks. However, lesion experiments showed that segregation of face and object processing had emerged spontaneously in the dual-task network. Critically, a dual-task network optimized for food and object categorization showed less task segregation. These results suggest that functional specialization in the brain exists for faces but not for food because food and object categorization can be performed by relying on common representations while face and object recognition rely on inherently different computations.