Abstract
Mid-level visual features directly support an array of behaviors; thus, they may be critical for understanding the functional organization of visual cortex. However, attempts at characterizing mid-level features have been hampered by the difficulty of describing these features in words—they exist in an “ineffable valley” between the describable patterns of low-level vision (e.g., edges) and the commonsense concepts of visual cognition (e.g., objects). Here we developed a novel approach to identify interpretable emergent properties of mid-level representations in deep neural network (DNN) models of visual cortex. Using this approach, we examined DNN models that were fit to scene-evoked fMRI responses in category-selective regions of visual cortex—specifically, scene-selective cortex (sceneDNN) and object-selective cortex (objectDNN). Our method uses a semantically-guided image-occlusion procedure to systematically characterize how DNN activations are driven by the classes of objects within a scene. We examined the relationship between mid-level features and several object properties that have previously been associated with response preferences in visual cortex: curvature, real-world size, animacy, naturalness, and spatial stability. We found that while mid-level features appear complex and difficult to describe at a surface level, large-scale computational analyses can reveal a latent underlying relationship to interpretable object properties. Specifically, we found that the mid-level representations of the sceneDNN support a latent preference for objects that are boxy and large in real-world size. In contrast, mid-level representations of the objectDNN support a complementary preference for objects that are curvy and small in real-world size. These effects were robust to variations of model hyperparameters and were reproducible across different DNN models. Our findings show that curvature and real-world size are emergent organizing principles of mid-level visual representation, and they suggest that differences in mid-level feature tuning may be critical for understanding the organization of visual cortex into category-selective patches.