Abstract
Inferior temporal (IT) cortex of primates is topographically organized, with multiple large clusters of selectivity for different stimulus domains, including faces, bodies, and scenes, organized along a medial-lateral axis corresponding to the peripheral-foveal layout of earlier retinotopic cortex. In the homologous ventral temporal cortex (VTC) of humans, additional lateral word selectivity is seen, with a relative hemispheric left-lateralization that mirrors the relative right-lateralization of face selectivity. How does this topographic organization emerge, and what factors govern its consistent global layout? Recent computational modeling work using Interactive Topographic Networks has demonstrated that learning under biological constraints on the spatial cost and sign of connections within IT/VTC cortex is sufficient to produce domain-selective clusters. Here, we test whether additionally constrained connectivity with early retinotopic areas and with downstream non-visual areas, in combination with domain-biased viewing conditions and task demands, produces the global layout of human VTC in a bi-hemispheric model. Retinotopic constraints are modeled by adding a spatial cost on feedforward connections from the polar-coordinate convolutional retinotopy of V4 into posterior VTC within each hemisphere of the model. Viewing conditions are modeled as distributions of relative image size and fixation likelihood, with realistic domain-specific parameters. Downstream language demands are modeled by an additional left-lateralized “language” system with connectivity restricted to model LH anterior VTC. Learning in the model accounts for 1) the retinotopically-constrained layout of domain-selectivity for words, faces, objects, and scenes along a lateral-medial or foveal-peripheral axis, and 2) hemispheric organization in which words are relatively left lateralized and, due to competition with words, faces are relatively right lateralized. Our work instantiates the most complete computational model of human VTC topography to date, and paves the way for future work incorporating a dorsal stream, ventral-dorsal interactions, and more detailed downstream task demands.