Abstract
Color and form information are encoded throughout the human ventral visual hierarchy, and throughout the different processing layers of many convolutional neural networks (CNNs). However, these properties are often studied in isolation, leaving open the question of how they are encoded together. Here, we examine the relative coding strength of these two features in brain regions and CNN layers (i.e., how strongly each influences the representational geometry), and how this changes over the course of processing (i.e., from lower to higher visual regions and from lower to higher CNN layers). We collected fMRI responses from human V1, V2, V3, V4, and LOC to stimulus sets that each varied in their color and form features. We also collected the responses of multiple ImageNet-trained CNNs to the same stimuli. In Experiment 1, when orientation, a simple form feature, was varied, we found that color coding became increasingly dominant relative to form over the course of processing. By contrast, in Experiment 2, when curvature, a more complex form feature, was varied, form coding became increasingly more dominant relative to color over the course of processing. We observed qualitatively similar results in both the human brain and in CNNs, indicating that, relative to color coding, orientation coding decreases and curvature coding increases during visual processing. That said, despite similarities in the relative coding strengths of these features, CNNs and the human brain differed in how the absolute coding strengths of these features evolve during processing. Additionally, these similarities disappear in untrained CNNs, suggesting they arise from training the networks to recognize objects. Together, these results unveil how color and form jointly shape the visual representational space in both the human visual system and CNNs over the course of processing.