Abstract
Words constitute a unique, experience-dependent category within the representational space of the human ventral pathway. The prevailing view holds that learning to read repurposes a pre-existing region in the ventral occipitotemporal cortex for the recognition of written words. However, the initial function of this prototypical region (visual word form area, VWFA) remains elusive. In this study, by leveraging deep learning neural networks, we initially show that considerable word discrimination capacity can be derived from general non-word object recognition training. We find that objects similar to words in the network's 'object space' share more features that help in word recognition. This is mirrored in the human brain: our fMRI studies show that the objects that are closer to the words in the object space elicit higher responses in the VWFA. More importantly, such an effect is even true in the inferotemporal (IT) cortex of macaques, which are presumed to be naïve to words, both in terms of evolutionary history and visual experience. By utilizing fMRI techniques, we have successfully identified word-selective areas in the macaques’ IT cortex. Furthermore, we measured the responses to 1000 words and 1000 objects within the anterior IT cortex of two macaques, using wide-field imaging techniques. The results align with the findings in the human brain: the word area responds more to objects close to words in the object space. Through integrating findings from CNN, human fMRI, and fMRI and wide-field imaging in macaques, our work highlights the possibility that VWFA may have initially evolved to represent features of non-word objects that are closely related to words in the object space and shed light on the general principles governing the genesis of category-specific areas in the IT cortex.